# AddressSanitizer: heap-use-after-free in GlusterFS clients #3945
Link to bug report: https://github.com/gluster/glusterfs/issues/3945
## Bug Report
Tao encountered the bug several times, but is unable to reproduce due to concurrency.
GlusterFS was configured with 3 servers and 1 client like this:
```
gluster volume create test-volume disperse 3 redundancy 1 $srvs force
```
The bug can also sometimes be triggered with:
```
r0 = open$dir(&(0x7f0000000000)='./file0\x00', 0x40040, 0x0)
r1 = open(&(0x7f0000000040)='./file0\x00', 0x2300, 0x0)
fsetxattr$security_ima(r1, &(0x7f0000000080), 0x0, 0x0, 0x0)
r2 = open(&(0x7f00000000c0)='./file0/file0\x00', 0x100, 0x24)
write$binfmt_aout(r0, &(0x7f0000000640)={{0x108, 0xd8, 0x3, 0x350, 0x22f, 0x9, 0x245, 0x9}, "a30fc845338b1fc576d17087199eeb89296aefe77a34cf64359bf31dcbb5dab07ca85b4b01a39c76def457575040a300a6ae1b78df0ee3b72eeed79c924dfcc320631d34e006729738e0d07c2091e8b22ea5055afc5caaf4f9eb0a2472197c32634d499da949189cd13b4cea467ba55317de10e83608ee5f49821a17c67a7a67f4f87866562f6a92783556ab9cb424887d1a27", ['\x00', '\x00', '\x00', '\x00', '\x00']}, 0x5b3)
```
See the link above for the error trace.
## Error Trace analysis
This error trace shows a heap-use-after-free error that occurred while running the program. The error was detected by AddressSanitizer, which is a tool used to detect memory-related bugs.
The error occurred in thread T6, at a READ of size 4 at address `0x6040000949a8`. The stack trace shows that the error was caused by a function called `_fuse_fd_inherit_directio_ in the file `_fuse-bridge.c`, which was called from the function `fuse_fd_cbk`. The latter function was called by io_stats_open_cbk, which was called by default_open_cbk, and so on.
The error occurred because a region of memory of 44 bytes starting at address `0x604000094990` was freed by thread T9 while thread T6 was still trying to read from it. Specifically, the memory address being accessed (`0x6040000949a8`) was located 24 bytes inside the freed region. The free operation was performed in the function `fuse_fd_ctx_destroy` in the file fuse-bridge.c.
The warning message at the beginning of the trace indicates that AddressSanitizer does not fully support makecontext/swapcontext functions and may produce false positives in some cases, meaning that the error detected may not necessarily be a real bug, but rather a false alarm caused by the limitations of the tool.
### T0
Creates T7 in the following trace:
`#8 0x7ffff6fef0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)`
Basic call to start the program.
`#7 0x55555556c37a in main /root/glusterfs/glusterfsd/src/glusterfsd.c:2774`
Normal call to `gf_io_run` at the end of `main`.
`#6 0x7ffff7467b7f in gf_io_run /root/glusterfs/libglusterfs/src/gf-io.c:516`
Normal call to `gf_io_main`.
`#5 0x7ffff7467b7f in gf_io_main /root/glusterfs/libglusterfs/src/gf-io.c:431`
The referenced loc 431: `res = gf_io.engine.wait();`. Doesn't really make sense here, since it seems like `gf_event_dispatch` seems to be called from the continuation of the error trace. But if the program took the branch, an error already seemed to have occurred here, because `gf_io_sync` failed.
```c
res = gf_io_sync(handlers->setup, data, GF_IO_HANDLER_TIMEOUT,
GF_IO_HANDLER_RETRIES);
if (caa_likely(res >= 0)) {
res = gf_io.engine.wait(); // <-- 431
gf_io_sync(handlers->cleanup, data, GF_IO_HANDLER_TIMEOUT,
GF_IO_HANDLER_RETRIES);
}
```
`#4 0x7ffff7359f89 in gf_event_dispatch /root/glusterfs/libglusterfs/src/event.c:115`
Async stuff. Dispatch event for notification by kernel through epoll?
`#3 0x7ffff73feaf2 in event_dispatch_epoll /root/glusterfs/libglusterfs/src/event-epoll.c:797`
Epoll dispatch
```
#0 0x7ffff75cd805 in pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x3a805)
#1 0x7ffff72feb97 in gf_thread_vcreate /root/glusterfs/libglusterfs/src/common-utils.c:3261
#2 0x7ffff731028d in gf_thread_create /root/glusterfs/libglusterfs/src/common-utils.c:3284
```
Rest is normal thed creation stuff.
In don't see to what purpose it could have created the thread. seems like standard entry into any program.
### T9
Here the error related memory is allocated and freed.
#### Allocation
`#21 0x7ffff71c5608 in start_thread /build/glibc-YYA7BZ/glibc-2.31/nptl/pthread_create.c:477`
Thread start. Creates a fuse specific thread.
`#20 0x7ffff2f6fd8d in fuse_thread_proc /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:6326`
call to `gf_async`.
`#19 0x7ffff2f6fd8d in gf_async ../../../../libglusterfs/src/glusterfs/async.h:187`
Call to the callback function.
` #18 0x7ffff2f5dad9 in fuse_dispatch /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:6091`
In this case the `fuse_dispatch` function. The loc call the fuse function for execution.
` #17 0x7ffff2f64de5 in fuse_open /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:2981`
In this case `fuse_open`, i.e. a new file is opened.
```
#6 0x7ffff2f1ef0c in fuse_resolve_done /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:629
#7 0x7ffff2f1ef0c in fuse_resolve_all /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:653
#8 0x7ffff2f1ec7c in fuse_resolve /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:620
#9 0x7ffff2f1ef59 in fuse_resolve_all /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:650
#10 0x7ffff2f1ef59 in fuse_resolve_all /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:638
#11 0x7ffff2f1ce7d in fuse_resolve_continue /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:668
#12 0x7ffff2f1e1a6 in fuse_resolve_inode /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:352
#13 0x7ffff2f1e930 in fuse_resolve /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:617
#14 0x7ffff2f1ef59 in fuse_resolve_all /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:650
#15 0x7ffff2f1ef59 in fuse_resolve_all /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:638
#16 0x7ffff2f1eff5 in fuse_resolve_and_resume /root/glusterfs/xlators/mount/fuse/src/fuse-resolve.c:680
```
FUSE resolve process. Until the fuse operation is resumed below:
`#5 0x7ffff2f67c81 in fuse_fop_resume /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:1163`
Simple call to the function to resume.
`#4 0x7ffff2f45641 in fuse_open_resume /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:2944`
FUSE file descriptor context is checked and created
`#3 0x7ffff2f2a448 in fuse_fd_ctx_check_n_create /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:116`
File descriptor gets locked `__fuse_fd_ctx_check_n_create(this, fd)` gets called. And should be unlocked after successful return.
`#2 0x7ffff2f2a337 in __fuse_fd_ctx_check_n_create /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:90`
There is no context for the given file desciptor, so a new one is created in loc 90. `GF_CALLOC` is called to allocate the space for one `fd_ctx`.
```
#0 0x7ffff76a0dc6 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10ddc6)
#1 0x7ffff735b226 in __gf_calloc /root/glusterfs/libglusterfs/src/mem-pool.c:177
```
Standard library allocation calls.
### Free
```
#6 0x7ffff2f6fd8d in fuse_thread_proc /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:6326
#7 0x7ffff71c5608 in start_thread /build/glibc-YYA7BZ/glibc-2.31/nptl/pthread_create.c:477
```
A fuse thread is started. and async is called.
`#5 0x7ffff2f6fd8d in gf_async ../../../../libglusterfs/src/glusterfs/async.h:187`
Callback
`#4 0x7ffff2f5dad9 in fuse_dispatch /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:6091`
Call of the `fuse_ops[opcode]` function.
`#3 0x7ffff2f64205 in fuse_release /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:3483`
In this case `fuse_release`. Which calls to destroy the `fd_ctx` previously created.
```
#0 0x7ffff76a07cf in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0x10d7cf)
#1 0x7ffff735be19 in __gf_free /root/glusterfs/libglusterfs/src/mem-pool.c:383
#2 0x7ffff2f2160f in fuse_fd_ctx_destroy /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:141
```
Context gets dstroyed with a free call of the standard library function.
### USE AFTER FREE BY T6
```
#24 0x7ffff71c5608 in start_thread /build/glibc-YYA7BZ/glibc-2.31/nptl/pthread_create.c:477
#25 0x7ffff70ea102 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122102)
```
Thread creation with standard library calls.
```
#22 0x7ffff74006c0 in event_dispatch_epoll_handler /root/glusterfs/libglusterfs/src/event-epoll.c:631
#23 0x7ffff74006c0 in event_dispatch_epoll_worker /root/glusterfs/libglusterfs/src/event-epoll.c:742
```
Standard epoll prcedures. Call to the socket handler.
`#21 0x7ffff0356b39 in socket_event_handler /root/glusterfs/rpc/rpc-transport/socket/src/socket.c:2710`
Socket handler is started.
`#20 0x7ffff0356b39 in socket_event_handler /root/glusterfs/rpc/rpc-transport/socket/src/socket.c:2790`
```c
// poll_err is an argument
int ret = -1;
if (!poll_err) {
ret = socket_complete_connection(this);
}
...
if (!ret && poll_in) {
ret = socket_event_poll_in(this, !poll_err);
gf_log(this->name, GF_LOG_TRACE,
"(sock:%d) "
"socket_event_poll_in returned %d",
priv->sock, ret);
notify_handled = _gf_true;
}
```
`#19 0x7ffff0356b39 in socket_event_poll_in /root/glusterfs/rpc/rpc-transport/socket/src/socket.c:2399`
Handle the socket event asynchronoulsy. Call to `gf_async`.
`#18 0x7ffff0356b39 in gf_async ../../../../libglusterfs/src/glusterfs/async.h:187`
`#17 0x7ffff03465a6 in socket_event_poll_in_async /root/glusterfs/rpc/rpc-transport/socket/src/socket.c:2358`
`rpc_transportnotify` is called with argument `RPC_TRANSPORT_MSG_RECEIVED`.
```
#15 0x7ffff7225fca in rpc_clnt_notify /root/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:890
#16 0x7ffff721f983 in rpc_transport_notify /root/glusterfs/rpc/rpc-lib/src/rpc-transport.c:52
```
The client gets notified, that msg was received
`#14 0x7ffff7225fca in rpc_clnt_handle_reply /root/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:723`
Client calls the callback function upon reception of notification.
`#13 0x7fffef2069b4 in client4_0_open_cbk /root/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:346`
During the exit of the function `CLIENT_STACK_UNWIND` is called with a file descriptor as an argument.
`#12 0x7fffef0642f2 in ec_open_cbk /root/glusterfs/xlators/cluster/ec/src/ec-inode-read.c:741`
Path to this function from the stack unwind is unclear to me. As normal out of function call `ec_complete(fop)` is called to finalize the file operation. What does the erasure coding do in such situations?
`#11 0x7fffef021c52 in ec_complete /root/glusterfs/xlators/cluster/ec/src/ec-common.c:579`
Continue erasure coding.
```
#8 0x7fffef064ea2 in ec_manager_open /root/glusterfs/xlators/cluster/ec/src/ec-inode-read.c:865
#9 0x7fffef01fed0 in __ec_manager /root/glusterfs/xlators/cluster/ec/src/ec-common.c:3017
#10 0x7fffef0203b9 in ec_resume /root/glusterfs/xlators/cluster/ec/src/ec-common.c:502
```
`ec_resume` calls the erasure coding manager, who calls the `fop->handler()` normally. in this case `ec_manager_open`. Where we are in state `EC_STATE_REPORT` where the callback function is called. In this case `default_open_cbk`.
`#7 0x7ffff7492ae8 in default_open_cbk /root/glusterfs/libglusterfs/src/defaults.c:1216`
Is linked on compilation. no insight.
`#6 0x7fffeed91a28 in gf_utime_open_cbk /root/glusterfs/xlators/features/utime/src/utime-autogen-fops.c:124`
Just a template exists, no insight.
`#5 0x7ffff7492ae8 in default_open_cbk /root/glusterfs/libglusterfs/src/defaults.c:1216`
Is linked on compilation, no insight.
`#4 0x7fffeeccbe0f in mdc_open_cbk /root/glusterfs/xlators/performance/md-cache/src/md-cache.c:2046`
Metadata cache open callback function, where `MDC_STACK_UNWIND` is called.
`#3 0x7ffff7492ae8 in default_open_cbk /root/glusterfs/libglusterfs/src/defaults.c:1216`
Is linked on compilation, no insight.
`#2 0x7fffeec42026 in io_stats_open_cbk /root/glusterfs/xlators/debug/io-stats/src/io-stats.c:2119`
`STACK_UNWIND_STRICT` is called.
`#1 0x7ffff2f6a3f3 in fuse_fd_cbk /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:1643`
`fuse_fd_inherit_directio` is called in handling the callback.
`#0 0x7ffff2f6a3f3 in fuse_fd_inherit_directio /root/glusterfs/xlators/mount/fuse/src/fuse-bridge.c:1564`
When trying to create the flags for `fuse_open_out` it tries to access `tmp_fdctx->open_flags` which no loger exists, since it was freed.
```c
tmp_fd = fd_lookup(fd->inode, 0);
if (tmp_fd) {
tmp_fdctx = fuse_fd_ctx_get(this, tmp_fd);
if (tmp_fdctx) {
foo->open_flags &= ~FOPEN_DIRECT_IO;
foo->open_flags |= (tmp_fdctx->open_flags & FOPEN_DIRECT_IO);
}
}
```
It seems that whatever the MDcache provided is no longer valid.
```
fuse_write
fuse_resolve_and_resume(state)
fuse_resolve_all(state)
fuse_resolve
fuse_resolve_done
fuse_fop_resume
fuse_write_resume (state->resume_fn(fn)) state->active_subvol->fops->writev
FUSE_FOP (meta_writev) -> default_writev -> io_stats_writev -> iot_writev -> enqueue(fop_writev_stub)
STACK_WIND
STACK_WIND_COMMON
next_xl_fn (e.g., wb_writev)
STACK_WIND_TAIL
next_xl_fn (e.g., )
IOT_FOP
iot_schedule
do_iot_schedule
__iot_enqueue
__iot_workers_scale
gf_thread_create
iot_worker
call_resume
call_resume_wind
default_writev_resume
mdc_writev
qr_writev
ob_writev
OB_POST_FD
ob_open_and_resume_fd
... call_resume
call_resume_wind
ob_open_resume
default_open
gf_utime_open
default_open
ec_gf_open
ec_open
OB_POST_COMMON
fop_writev_stub
ob_stub_dispatch x
call_resume x
call_resume_wind
ob_writev
default_writev
wb_writev
fop_writev_stub
wb_process_queue
wb_fulfill
wb_fulfill_head
gf_utime_writev
default_writev
ec_gf_writev
ec_writev
ec_manager
__ec_manager
ec_manager_writev
ec_writev_start
```