# Bringing Newer glibc into Older Images for MPICH Host Swap
## The Issue to Solve
For containerized MPI applications, the host MPI library and its dependencies are swapped into the container at runtime to enable cross-node communication. However, one critical dependency not brought in is `glibc`. Due to `glibc`'s backward compatibility, the swapped-in MPI library functions correctly in container images with newer `glibc` versions. But for images with older `glibc`, applications often fail to run with the swapped-in MPI library. The `glibc` version in the image must meet or exceed the highest version required by the MPI library or its dependencies.
The error might look like this:
```=
/app/check-mpi: /lib64/libc.so.6: version `GLIBC_2.27' not found (required by /opt/udiImage/modules/mpich/dep/libfabric.so.1)
/app/check-mpi: /lib64/libc.so.6: version `GLIBC_2.26' not found (required by /opt/udiImage/modules/mpich/dep/libfabric.so.1)
/app/check-mpi: /lib64/libm.so.6: version `GLIBC_2.26' not found (required by /opt/udiImage/modules/mpich/dep/libgfortran.so.5)
/app/check-mpi: /lib64/libc.so.6: version `GLIBC_2.27' not found (required by /opt/udiImage/modules/mpich/dep/libcxi.so.1)
/app/check-mpi: /lib64/libc.so.6: version `GLIBC_2.27' not found (required by /opt/udiImage/modules/mpich/dep/libssh.so.4)
/app/check-mpi: /lib64/libc.so.6: version `GLIBC_2.27' not found (required by /opt/udiImage/modules/mpich/dep/libgssapi_krb5.so.2)
/app/check-mpi: /lib64/libselinux.so.1: no version information available (required by /opt/udiImage/modules/mpich/dep/libkrb5support.so.0)
```
In this example, the application requires `glibc` version `2.27` or above, but the image contains `glibc` version `2.25`.
```bash=
[root@nid200005 app]# /lib64/libc-2.25.so
GNU C Library (GNU libc) stable release version 2.25, by Roland McGrath et al.
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 7.2.1 20170915 (Red Hat 7.2.1-2).
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
```
You can verify this using:
```bash
objdump -T <shared_lib> | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -V | tail -n 1
```
This command will reveal the highest `glibc` version required by the application’s dependencies. In this case, the maximum requirement is `GLIBC_2.27` for `libfabric.so`.
The following commands lists the `glibc` version requirements for each swapped-in library:
```bash=
for i in $(ls /usr/lib/shifter/mpich-2.2/*.so*); do
echo $i;
objdump -T $i | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -V | tail -n 1;
done
for i in $(ls /usr/lib/shifter/mpich-2.2/dep/*.so*); do
echo $i;
objdump -T $i | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -V | tail -n 1;
done
```
## Constraints on Possible Solutions
- **Runtime Modifications:** Modifying the image at runtime using tools like `patchelf` is impractical, especially for large-scale jobs (`srun -n <many_num_jobs>`).
- **Image Rebuild:** Rebuilding the image with pre-applied fixes is possible but not ideal. A better solution avoids rebuilding.
## First Attempt: Bring a Newer `libc.so`
Adding only `libc.so.6` from the host fails because `glibc` involves multiple interdependent libraries. For example:
```bash=
[root@nid200005 scratch]# LD_LIBRARY_PATH=/scratch/libc_only:$LD_LIBRARY_PATH /app/check-mpi
/app/check-mpi: /lib64/libm.so.6: version `GLIBC_2.26' not found (required by /opt/udiImage/modules/mpich/dep/libgfortran.so.5)
```
Here, `libgfortran.so.5` depends on `libm.so` with version `GLIBC_2.26`.
## Second Attempt: Bring All Related `glibc` Libraries
Even with all related libraries, the following error occurs:
```bash=
[root@nid200005 scratch]# LD_LIBRARY_PATH=/scratch/libc_libm:$LD_LIBRARY_PATH /app/check-mpi
/app/check-mpi: /lib64/libselinux.so.1: no version information available (required by /opt/udiImage/modules/mpich/dep/libkrb5support.so.0)
/app/check-mpi: relocation error: /scratch/libc_libm/libc.so.6: symbol _dl_exception_create, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
```
This happens because `ld-linux.so.2` and `libc.so.6` are mismatched. The executable uses the hardcoded `ld-linux.so.2` path from its link time, ignoring alternatives in `LD_LIBRARY_PATH`.
## Third Attempt - Can We Overwrite the System `ld-linux.so.2`?
Nice try! You cannot.
```bash
[root@nid200005 scratch]# cp /scratch/host_lib64/ld-2.31.so /lib64/
[root@nid200005 scratch]# cp /scratch/host_lib64/ld-linux-x86-64.so.2 /lib64/
cp: overwrite '/lib64/ld-linux-x86-64.so.2'? y
cp: cannot create regular file '/lib64/ld-linux-x86-64.so.2': Text file busy
```
The reason is simple: almost every binary depends on it! Even `cp` itself relies on it.
```bash
[root@nid200005 /]# ldd /bin/cp
linux-vdso.so.1 (0x00007ffd965b1000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f0981336000)
libacl.so.1 => /lib64/libacl.so.1 (0x00007f098112d000)
libattr.so.1 => /lib64/libattr.so.1 (0x00007f0980f28000)
libc.so.6 => /lib64/libc.so.6 (0x00007f0980b53000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f09808e1000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f09806dd000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0981780000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f09804be000)
```
If we cannot change the system `ld-linux-x86-64.so.2`, how can we use an alternate version?
---
## Fourth Attempt - Using an Alternate `ld-linux.so.2`
The path to the interpreter (`ld-linux.so.2`) is hardcoded into the binary at link time. The obvious solution is to use `patchelf` to rewrite it. However, we want to avoid modifying the binaries directly.
For reference, here’s how to use `patchelf` for this purpose. Note that this example includes the full suite of GLIBC libraries:
```bash=
# Using release-0.13 branch of https://github.com/NixOS/patchelf.git
# ./bootstrap.sh && ./configure && make -j20 && make install
[root@nid200005 app]# patchelf --set-interpreter /scratch/host_lib64/ld-2.31.so check-mpi
[root@nid200005 app]# LD_LIBRARY_PATH=/scratch/host_lib64 ./check-mpi
Hello from rank 0, on nid200005. (core affinity = 0-255)
```
This brings us to the final solution. What happens if you simply execute `/lib64/ld-linux-x86-64.so.2` on the command line?
```bash=
[root@nid200005 host_lib64]# /lib64/ld-linux-x86-64.so.2
Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]
You have invoked `ld.so', the helper program for shared library executables.
This program usually lives in the file `/lib/ld.so', and special directives
in executable files using ELF shared libraries tell the system's program
loader to load the helper program from this file. This helper program loads
the shared libraries needed by the program executable, prepares the program
to run, and runs it. You may invoke this helper program directly from the
command line to load and run an ELF executable file; this is like executing
that file itself, but always uses this helper program from the file you
specified, instead of the helper program file specified in the executable
file you run. This is mostly of use for maintainers to test new versions
of this helper program; chances are you did not intend to run this program.
--list list all dependencies and how they are resolved
--verify verify that given object really is a dynamically linked
object we can handle
--inhibit-cache Do not use /etc/ld.so.cache
--library-path PATH use given PATH instead of content of the environment
variable LD_LIBRARY_PATH
--inhibit-rpath LIST ignore RUNPATH and RPATH information in object names
in LIST
--audit LIST use objects named in LIST as auditors
```
READ THE OUTPUT CAREFULLY! The solution is right there. It is as simple as running the executable with the alternate `ld.so` like the following:
```bash
[root@nid200005 /]# /scratch/host_lib64/ld-linux-x86-64.so.2 \
> --library-path /scratch/host_lib64:/opt/udiImage/modules/mpich:/opt/udiImage/modules/mpich/dep \
> /app/check-mpi
Hello from rank 0, on nid200005. (core affinity = 0-255)
```
---
### Verifying the Solution with Multi-Node MPI
```bash=
dingpf@muller:login02:/mscratch/sd/d/dingpf/mpich-glibc-swap> salloc --nodes 2 --qos interactive --time 04:00:00 --constraint cpu
salloc: Pending job allocation 893410
salloc: job 893410 queued and waiting for resources
salloc: job 893410 has been allocated resources
salloc: Granted job allocation 893410
salloc: Waiting for resource configuration
salloc: Nodes nid[200003-200004] are ready for job
dingpf@nid200003:/mscratch/sd/d/dingpf/mpich-glibc-swap> cat script/srun-glibc-swap.sh
srun --ntasks-per-node 4 -N 2 podman-hpc run --rm --mpi \
-v $SCRATCH:/scratch \
ghcr.io/dingp/fedora:26-mpich \
/scratch/host_lib64/ld-linux-x86-64.so.2 \
--library-path /scratch/host_lib64:/opt/udiImage/modules/mpich:/opt/udiImage/modules/mpich/dep \
/app/check-mpi
dingpf@nid200003:/mscratch/sd/d/dingpf/mpich-glibc-swap> . script/srun-glibc-swap.sh
Hello from rank 3, on nid200003. (core affinity = 0-255)
Hello from rank 2, on nid200003. (core affinity = 0-255)
Hello from rank 1, on nid200003. (core affinity = 0-255)
Hello from rank 5, on nid200004. (core affinity = 0-255)
Hello from rank 6, on nid200004. (core affinity = 0-255)
Hello from rank 7, on nid200004. (core affinity = 0-255)
Hello from rank 4, on nid200004. (core affinity = 0-255)
Hello from rank 0, on nid200003. (core affinity = 0-255)
```
It works!
## Wrap-up
Exploring this topic was a fun experience like going down a rabbit hole. The full example is available in [this repository](https://github.com/dingp/mpich-glibc-swap), which includes:
- [`container/fedora-26.Dockerfile`](https://github.com/dingp/mpich-glibc-swap/blob/main/container/fedora-26.Dockerfile): A Dockerfile for the container image.
- [`app/xthi-mpi.c`](https://github.com/dingp/mpich-glibc-swap/blob/main/app/xthi-mpi.c): Source code for a simple MPI application used for testing (sourced from [NERSC documentation](https://docs.nersc.gov/jobs/affinity/xthi-mpi.c)).
- [`script/create_host_lib64.sh`](https://github.com/dingp/mpich-glibc-swap/blob/main/script/create_host_lib64.sh): A script to gather GLIBC library bundles from the host (valid for `muller` or `perlmutter` as of 2024-12-07). Note: This script could be improved to eliminate hard-coded library versions.
- Three scripts designed for compute nodes:
- [`script/run-fedora-mpi-it.sh`](https://github.com/dingp/mpich-glibc-swap/blob/main/script/run-fedora-mpi-it.sh): Runs the container interactively with all required libraries volume-mounted, enabling live testing.
- [`script/srun-glibc-swap.sh`](https://github.com/dingp/mpich-glibc-swap/blob/main/script/srun-glibc-swap.sh): Executes the MPI application using the appropriate `ld.so` loader.
- [`script/srun-no-glibc-swap.sh`](https://github.com/dingp/mpich-glibc-swap/blob/main/script/srun-no-glibc-swap.sh): Demonstrates failure due to mismatched `GLIBC` versions required by the MPI libraries or their dependencies.
### Fun Fact
While searching for a base image with an older `GLIBC` version, I initially considered `RHEL7` equivalents like `ScientificLinux 7` and `CentOS 7`. Unfortunately, these proved unusable due to the lack of active repository mirrors. Without access to mirrors, a minimal DockerHub image becomes useless, as `gcc` and other utilities are required. Fortunately, `Fedora 26` still provides active mirrors and includes an appropriately aged `GLIBC` version.