Document owner | @ErichDonGubler |
Last updated | 2022-04-19 |
Status | Resolved. This document's contents are out-of-date. |
Backlink to rebase report | Link |
Currently, Mozilla's rebase of ANGLE v4 (see here for version/commit details) is running into crashing issue while calling IUnknown::Release
on a specific COM object. This crash can be straightforwardly reproduced in WebGL with relatively simple shaders and gl
commands to use a video
element as the source of a 2D texture(see repro steps' sample HTML source below). This issue appears to be due to imbalanced COM object reference counting operations on a ID3DDevice
instance created and assigned to the Renderer11::mDevice
member in the rx::Renderer11::callD3D11CreateDevice()
method, such that a double-free occurs. Therefore, one of the following must be true for the COM object pointed to by rx::Renderer11::mDevice
:
IUnknown::AddRef
is not being called enough.IUnknown::Release
is being called too many times.This imbalance is exposed in one of several execution paths when they are the last path to run; the paths that we currently know of are:
rx::Renderer11::release()
, where several dozen decrements of mDevice
occur.mozilla::gfx::SharedSurface_ANGLEShareHandle::~SharedSurface_ANGLEShareHandle()
, while destroying the IDXGIKeyedMutex
it extracts from ANGLE (viz., the one assigned to the mKeyedMutex
member).It is currently unclear whether this defective behavior is rooted in new ANGLE source, an older issue in Firefox code that is only now being exposed, or something else.
TODO: add source links
You will need:
firefox.exe
's child processes.The Windows SDK itself can be installed from this archive page.
The Windows SDK component for your Visual Studio installation will also need to have a matching version installed. The easiest way to do this is to use the Visual Studio Installer
application to Modify
your installations Individual components
, like in these screenshots:
631c67e3
, as observable from CI runs in Mozilla's Treeherder.sfz
with no arguments in a folder with these files (AKA repro.zip
).The basic flow for reproduction is:
Change working directory to your Firefox checkout (affectionately referred to as $GECKO_CHECKOUT
going forward).
Run ./mach build
with optimizations disabled and debug symbols enabled; for convenience, you can just use the following mozconfig
file at the root of your checkout of Firefox:
Run ./mach run --debug
. This will generate and open a new Visual Studio solution.
Configure the solution's debugging to automatically attach to child processes.
Debug
> Other Debug Targets
> Child Process Debugging Settings...
Enable child process debugging checkbox
, and save it (i.e., Ctrl + S).
Configure the invocation of firefox.exe
to open your repro.zip
's index.html
page automatically.
Extract the repro.zip
archive from earlier into a directory somewhere. From this point, we'll call that "the repro
directory".
Start serving files from your repro
directory using a local HTTP server. As mentioned above, @ErichDonGubler used a naive invocation of sfz
, which serves to port 5000 by default:
Open the solution's Properties
from the Solution Explorer
view, which by default is on the left side.
Add -new-tab
arguments that point to where your local file server will be serving up the files from repro.zip
. Continuing the above example, you can use -new-tab http://127.0.0.1:5000/index.html
, as seen in this screenshot:
All you need to do now is Debug
in Visual Studio (i.e., use the F5 shortcut), and the problem should reproduce in less than twenty or so seconds. If this does not happen, it's probably because the index.html
tab didn't load all the way; refresh the page, and it should happen quickly, like in the following screenshot:
βΉοΈ N.B. that you will encounter a debug break from what seems to be an assert in Microsoft code when the refcount of mDevice
goes to 0
. This is expected (and probably related to Microsoft's code detecting that something is off ), but is not the crash that we're debugging. Example screenshot:
FIXME: s/751/756 in screenshots of breakpoint lines
Set a breakpoint in Visual Studio at $GECKO_CHECKOUT/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp:756
, immediately after where the createDevice
function pointer arg is called within Renderer11::callD3D11CreateDevice
.
Using the address that mDevice
gets set to, you will be able to get the address of the reference count itself
You can create a data breakpoint in the Watch
view for the refcount as you:
Add item to watch
row using the following expression:β¦where $ADDRESS_OF_COM_PTR
is the address stored in the mDevice
variable from the previous step.
You can now set a memory breakpoint on the above Watch
expression by right-clicking on it, and clicking the Break When Value Changes
item.
Building on the previous section, @ErichDonGubler has generated reports of the callstack for each refcounting operation via the Output
window in Visual Studio. This is done by adding Action
s to the settings of the breakpoints created in the previous section. Concretely:
This content has been outmoded by the live collaboration on GitHub below.
For the Renderer11::callD3D11CreateDevice
breakpoint, @ErichDonGubler has been using:
β¦and yes, \n
does not get translate to a newline, but it's workable in the scripting he's is doing on top of this to balance the refcount operations tree he's building.
For the data breakpoint on mDevice
's inner refcount, @ErichDonGubler has been using:
Great news! @ErichDonGubler has started writing a tool to consume these logs with Nical on GitHub.