# Refcounting crash in ANGLE 4 rebase | | | | --- | --- | | Document owner | @ErichDonGubler | | Last updated | 2022-04-19 | | Status | **Resolved**. This document's contents are out-of-date. | | Backlink to rebase report | [Link](https://hackmd.io/XxvU5HgHQVWw-kxKkKGA_A?both=#Refcounting-crash-around-ID3DDevice) | Currently, Mozilla's rebase of ANGLE v4 (see [here](https://hackmd.io/XxvU5HgHQVWw-kxKkKGA_A) for version/commit details) is running into crashing issue while calling [`IUnknown::Release`](https://learn.microsoft.com/en-us/windows/win32/api/unknwn/nf-unknwn-iunknown-release) on a specific COM object. This crash can be straightforwardly reproduced in WebGL with relatively simple shaders and `gl` commands to use a [`video` element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video) as the source of a 2D texture(see [repro steps](#Reproducing-the-crash)' sample HTML source below). This issue appears to be due to imbalanced COM object reference counting operations on a [`ID3DDevice`](https://learn.microsoft.com/en-us/windows/win32/api/d3d11/nn-d3d11-id3d11device) instance created and assigned to the `Renderer11::mDevice` member in the `rx::Renderer11::callD3D11CreateDevice()` method, such that a double-free occurs. Therefore, one of the following must be true for the COM object pointed to by `rx::Renderer11::mDevice`: * [`IUnknown::AddRef`](https://learn.microsoft.com/en-us/windows/win32/api/unknwn/nf-unknwn-iunknown-addref) is not being called enough. * [`IUnknown::Release`](https://learn.microsoft.com/en-us/windows/win32/api/unknwn/nf-unknwn-iunknown-release) is being called too many times. This imbalance is exposed in one of several execution paths when they are the _last_ path to run; the paths that we currently know of are: * In `rx::Renderer11::release()`, where several dozen decrements of `mDevice` occur. * In `mozilla::gfx::SharedSurface_ANGLEShareHandle::~SharedSurface_ANGLEShareHandle()`, while destroying the `IDXGIKeyedMutex` it extracts from ANGLE (viz., the one assigned to the `mKeyedMutex` member). It is currently unclear whether this defective behavior is rooted in new ANGLE source, an older issue in Firefox code that is only now being exposed, or something else. TODO: add source links ## Reproducing the crash ### Dependencies You will need: 1. A Windows machine with a recent version of the Windows OS. the remaining dependencies here are assumed to be installed on this device. 1. A [working development environment for Firefox](https://firefox-source-docs.mozilla.org/setup/windows_build.html). 1. A 2022 edition of the MSVC toolchain (which should be mostly completed by the previous step), plus some configuration steps. :::spoiler 1. Install the Visual Studio editor as part of this. 1. You will need the [Child Process Debugging Power Tool](https://marketplace.visualstudio.com/items?itemName=vsdbgplat.MicrosoftChildProcessDebuggingPowerTool) in order to automatically attached to `firefox.exe`'s child processes. 1. You will need the proper version of the Windows 10 SDK. ATOW, that's version 10.0.20348.0. 1. The Windows SDK itself can be installed from [this archive page](https://developer.microsoft.com/en-us/windows/downloads/sdk-archive/). 1. The Windows SDK component for your Visual Studio installation will also need to have a matching version installed. The easiest way to do this is to use the `Visual Studio Installer` application to `Modify` your installations `Individual components`, like in these screenshots: :::spoiler Screenshots ![](https://hackmd.io/_uploads/BJwPqIEqo.png) ![](https://hackmd.io/_uploads/ByKu9LV5j.png) ::: ::: 1. A checkout of Firefox with the ANGLE rebase implemented. For reference, Erich most recently reproduced this issue with the revision [`631c67e3`](https://hg.mozilla.org/try/rev/631c67e36dcb1c59f903878e37ec10e45e6c649d), as observable from CI runs in Mozilla's [Treeherder](https://treeherder.mozilla.org/jobs?repo=try&revision=8e078b45b35b4f36785a8970e22805319da77ee4). 1. A static file server to serve the reproduction files with. @ErichDonGubler used [`sfz`](https://crates.io/crates/sfz) with no arguments in a folder with [these files (AKA `repro.zip`)](https://github.com/mozilla/angle/files/10352343/repro.zip). ### Reproduction steps The basic flow for reproduction is: 1. Change working directory to your Firefox checkout (affectionately referred to as `$GECKO_CHECKOUT` going forward). 1. Run `./mach build` with [optimizations disabled](https://firefox-source-docs.mozilla.org/setup/configuring_build_options.html#optimization) and [debug symbols enabled](https://firefox-source-docs.mozilla.org/contributing/debugging/debugging_on_windows.html#debugging-optimized-builds); for convenience, you can just use the following `mozconfig` file at the root of your checkout of Firefox: ``` ac_add_options --disable-optimize ac_add_options --enable-debug ``` 1. Run `./mach run --debug`. This will generate and open a new Visual Studio solution. 1. Configure the solution's debugging to automatically attach to child processes. :::spoiler 1. Navigate to the app menu strip > `Debug` > `Other Debug Targets` > `Child Process Debugging Settings...` ![](https://hackmd.io/_uploads/HkhpnSNqo.png) 1. Tick the `Enable child process debugging checkbox`, and save it (i.e., <kbd>Ctrl</kbd> + <kbd>S</kbd>). ![](https://hackmd.io/_uploads/By0C3rE9i.png) ::: 1. Configure the invocation of `firefox.exe` to open your `repro.zip`'s `index.html` page automatically. :::spoiler 1. Extract the `repro.zip` archive from earlier into a directory somewhere. From this point, we'll call that "the `repro` directory". 1. Start serving files from your `repro` directory using a local HTTP server. As mentioned above, @ErichDonGubler used a naive invocation of `sfz`, which serves to port 5000 by default: ``` # With your `repro` directory as the CWD: $ sfz Files served on http://127.0.0.1:5000 ``` 3. Open the solution's `Properties` from the `Solution Explorer` view, which by default is on the left side. ![](https://hackmd.io/_uploads/H1TERrN9s.png) 1. Add `-new-tab` arguments that point to where your local file server will be serving up the files from `repro.zip`. Continuing the above example, you can use `-new-tab http://127.0.0.1:5000/index.html`, as seen in this screenshot: ![](https://hackmd.io/_uploads/HyTbkUVcj.png) ::: All you need to do now is `Debug` in Visual Studio (i.e., use the <kbd>F5</kbd> shortcut), and the problem should reproduce in less than twenty or so seconds. If this does not happen, it's probably because the `index.html` tab didn't load all the way; refresh the page, and it should happen quickly, like in the following screenshot: ![](https://hackmd.io/_uploads/SyuZWPEqo.png) :::info ℹ️ N.B. that you will encounter a debug break from what seems to be an assert in Microsoft code when the refcount of `mDevice` goes to `0`. This is expected (and probably related to Microsoft's code detecting that something is off :sweat_drops:), but is _not_ the crash that we're debugging. Example screenshot: ![](https://hackmd.io/_uploads/BJFg-vV5o.png) ::: FIXME: s/751/756 in screenshots of breakpoint lines ### Watching the refcount change in Visual Studio 1. Set a breakpoint in Visual Studio at `$GECKO_CHECKOUT/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp:756`, immediately after where the `createDevice` function pointer arg is called within `Renderer11::callD3D11CreateDevice`. Using the address that `mDevice` gets set to, you will be able to get the address of the reference count itself 2. You can create a data breakpoint in the `Watch` view for the refcount as you: 1. Click on the _`Add item to watch`_ row using the following expression: ``` (unsigned __int64*)(*(unsigned __int64*)($ADDRESS_OF_COM_PTR + 0x130) + 8) ``` ...where `$ADDRESS_OF_COM_PTR` is the address stored in the `mDevice` variable from the previous step. 4. You can now set a memory breakpoint on the above `Watch` expression by right-clicking on it, and clicking the `Break When Value Changes` item. ![](https://hackmd.io/_uploads/rysIhL4qo.png) ### Generating a log of refcounting operations Building on the previous section, @ErichDonGubler has generated reports of the callstack for each refcounting operation via the `Output` window in Visual Studio. This is done by adding `Action`s to the settings of the breakpoints created in the previous section. Concretely: 1. Set the following messages on the breakpoints created in the previous section: :::warning :warning: This content has been outmoded by the live collaboration on GitHub below. ::: :::spoiler Outdated * For the `Renderer11::callD3D11CreateDevice` breakpoint, @ErichDonGubler has been using: ``` --!! Starting to track refs `mDevice` at {mDevice}:$CALLSTACK ``` ...and yes, `\n` does not get translate to a newline, but it's workable in the scripting he's is doing on top of this to balance the refcount operations tree he's building. * For the data breakpoint on `mDevice`'s inner refcount, @ErichDonGubler has been using: ``` --!! Ref count for device was modified:$CALLSTACK ``` ::: 1. Run the debugger (viz., <kbd>F5</kbd>) once, and wait until the crash reproduces. :::warning Great news! @ErichDonGubler has started writing a tool to consume these logs with Nical on [GitHub](https://github.com/nical/angle-refcnt-dbg/). :::