Try   HackMD

Rebase of Firefox onto ANGLE's chromium/5359 branch

Current status

2022-11-23: Now in review. Some relevant work remains outstanding, but this should be broken out as follow-up work.

Rebasing Firefox patches

Rebasing

This rebase of mozilla/angle will be marked as the firefox-111 branch and based on upstream's chromium/5359.

Main open questions from initial rebase
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    3a1d51f5 was hard-blocking builds (-Wunreachable-code). @ErichDonGubler changed it to remove the remaining unreachable body of the function, since I feel like a theoretical merge conflict resolution would clearly present the intent.
    • KG: I think we don't need this commit anymore, but also don't remove code, use if (1) return; or similar.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Dunno if we need 400476a0bit might be outmoded by 58930a73ce, with some migration?
    • Some new gn config got introduced for this. After a few minutes, I skipped analyzing it deeply in favor of leaning on your involvement from before.
    • Feeling super nervous about making a call about bitflags in src/compiler/translator/SymbolTable_ESSL_autogen.cpp, since I don't understand them.
    • KG: Just trust the autogen'd output.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Dropping 33ffc1233 in favor of b8d6f8aa93.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Dropping c79c27ff2, seems upstreamed by afda22b0.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Dropping 23851a53, seems upstream-fixed by 59f496c0.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Added a commit to change std::atomic<angle::Mutex*> g_{,Surface}Mutex(nullptr) to std::atomic<angle::Mutex*> g_{,Surface}Mutex{}.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Might not need 9312f40a any more, if upstream fixed?
Other items for discussion
  • 🛑 Definitely need to run codegen to resolve any divergences that manual editing may have caused, @ErichDonGubler is figuring this out, ATM.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    604cd6cae gets listed in cherry_picks.txt, butwhy?
    • TL;DA: @ErichDonGubler was confused, we changed the auto-generated script output a bit to be clearer.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Do we think that regressions are likely from this point?
    • KG: It kinda doesn't matter! (Because we don't have any other choice but to upgrade and find out)
  • @ErichDonGubler experienced some issues with git cl format specifying too many arguments on Windows; patching to chunk into ~500 files was necessary. Upstream?

Bug-hunting

Compile issues

Fully resolved

ShCompileOptions changed

✔️ Found in 2022-11-15#1. Worked around compilation issues in 2022-11-15#2. Resolution: exhaustively specify all flags.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Unclear whether or not the set of specified flags is what we need long-term. Will let further testing and review determine what refinements may be necessary.

Discovered that ShCompileOptions constants like SH_VARIABLES were no longer defined. ShCompileOptions has apparently been migrated to a struct bitfields interface, rather than a bitflags-based one.

Migration for setting individual flags is easy, but one hiccup: we currently rely on way to specify "all flags" without enumerating each flag individually (options = -1; from dom/canvas/WebGLShaderValidator.cpp:ChooseValidatorCompileOptions).

astcenc_vecmathlib_* headers not found

✔️ Found in 2022-11-15#2. Resolution: define the new ASTCENC_DECOMPRESS_ONLY #define in update-angle.py.

#include "astcenc_vecmathlib_sse_4.h" failed to compile checkout/third_party/astc-encoder/src/Source/astcenc_mathlib.h, because these files were not discovered in the build graph when generated via ninja.

Apple platforms failed to compile astcenc stuff

✔️ Resolution: define the new ANGLE_ENABLE_APPLE_WORKAROUNDS #define in update-angle.py.

exit-time-destructors error in Windows

✔️ Found in 2022-11-15#1. Resolution: worked around in our fork.

std::mutex was incorrectly assumed to be trivially destructible in upstream's Renderer11 API on Windows. This has been resolved by applying the angle::base::NoDestructor type wrapper around the static Renderer11::gMutex member.

  • Submit a patch to ANGLE upstream

Pre-nightly run-time issues

Status: after a long and drawn-out set of issues, it appears we are finally good to try merging this rebase to mainline!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Fully resolved

Refcounting crash around ID3DDevice

Destructor crash for rx::Renderer11

Crash in EglDisplay::fTerminate

Destructor crash for Swapchain11's IDXGIKeyedMutex

✔️ Discovered in 2022-11-29#1. Reported here. Mainline fix posted for review here. See here for instructions to reproduce.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
This seems like a correct change, but there is still some uncertainty about whether or not this change will not expose other problems.

An unbalanced set of refcount operations in Firefox led to consistent use-after-free crashes while the ANGLE/D3D11 back end was being torn down. This issue already existed in Firefox, but was not exposed until this update.

  • Did the fix land?

ovr_multiview2_draw_buffers test cases fail

✔️ Discovered in 2023-01-09#1. Worked around in D162655.

Failures in the gl2c test group for Windows consistently failed in

TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the right edge of view 0 of color attachment 1 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the left edge of view 1 of color attachment 1 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the right edge of view 1 of color attachment 1 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the left edge of view 2 of color attachment 1 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the right edge of view 2 of color attachment 1 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the left edge of view 3 of color attachment 1 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the right edge of view 0 of color attachment 2 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the left edge of view 1 of color attachment 2 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the right edge of view 1 of color attachment 2 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the left edge of view 2 of color attachment 2 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the right edge of view 2 of color attachment 2 should be untouched
TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__extensions__ovr_multiview2_draw_buffers.html | the left edge of view 3 of color attachment 2 should be untouched 

EGL_FLEXIBLE_SURFACE_COMPATIBILITY_SUPPORTED_ANGLE removed

✔️ This issue appears to have been conclusively resolved. We will keep an eye out for regressions. Originally found with 2022-11-21#1.

Before this rebase, Fx's ANGLE back end for WebRender used EGL_FLEXIBLE_SURFACE_COMPATIBILITY_SUPPORTED_ANGLE as a context creation attribute. This has been removed in Chromium upstream in favor of the EGL_KHR_no_config_context extension, according to the upstream commit message removing it. With this rebase, usage of the old attribute would return EGL_BAD_ATTRIBUTE (visible as 0x3004 in failure logs).

Stack overflow in IntermNode.cpp's PropagatePrecision methods

✔️ Some AST traversal/manipulation changed in angle upstream, and it crashes with our current thread sizes. Resolved by increasing thread stack size of CanvasRendererThread. Discovered in 2022-11-29#1.

  • Looks resolvable in ANGLE upstream. Get this fixed?
    • Get MVRE in here.
    • Either stack sizes will need to be adjusted in client code (docs plz), or make a new thread under the hood on the ANGLE side.
    • Investigate why stack sizes increased so much between old and new versions in this upgrade.
  • This seems to reveal some architecture issues with Fx code; bringing down WebGL context operations because of a shader compilation crash is not an acceptable regression.
    • @UAJMTBhySlWm5X4K8GRAgg stated that this is likely to be a DoS problem for Mac, if we use ANGLE for shader compilation; there's no isolated GPU process on that platform.
    • KG: This kind of crash-on-bad-enough-shader isn't a hard-blocker for us to update, but it should be on our radar and receive its own prioritization.

Issues discovered after landing

Post-update action items

Snap builds broken

✔️ Bug 1812260: snap builds broken after missing. Should be resolved with D167815.

Try build history

At @jgilbert's recommendation, @ErichDonGubler has been using the union of the two ./mach try fuzzy queries:

  • !asan !tsan !plat !js !talos 'mochi 'webgl
  • !asan !tsan !plat !js !talos 'reftest
Old builds

2022-11-15

  • 1; somewhat behind tip at time of pushing.
  • 2: rebased onto latest tip as of 2022-11-15

2022-11-16

  • 1
  • 2
  • 3: fixed builds for Windows and Mac
  • 4

2022-11-16

  • 1: attempt to fix non-spurious Mac test failures

2022-11-18

  • 1: rebased onto latest tip from mozilla-central.

2022-11-21

  • 1: First non-mach try auto build, based on the first version of the last revision's review.
    • Failures:
      • The vast majority of failures in this build occurred because of an issue with EGL_FLEXIBLE_SURFACE_COMPATIBILITY_SUPPORTED_ANGLE being removed. These will not be noted explicitly.
      • Linux WebRender opt
        • bc2 appears to be intermittent.
        • bc15 appears to be intermittent.
      • Windows 10 x86 WebRender
        • opt
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            Mochitest with WebGL over IPC's gl2c is intermittent.

2022-11-29

  • 1: remove the now-invalid EGL_FLEXIBLE_SURFACE_COMPATIBILITY_SUPPORTED_ANGLE attribute
    • Failures
      • Windows 10 x64 WebRender opt:
        • M fails on:
          • gl1c (retried, failed)
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl1e (retried, failed): appears to be an ANGLE destructor issue
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl2c (retried, failed): appears to be an ANGLE destructor issue
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl2e3 (retried, failed): appears to be an ANGLE destructor issue
        • M-gli fails on:
          • gl1c (retried, failed)
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl2c (retried, failed): failed to fetch WebGL rendering context
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl1e (retried, failed): unexpected failures in shader-uniform-packing-restrictions.html caused by a stack overflow.
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl2e3 (retried, failed): appears to be an ANGLE destructor issue
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
            gl2e4 (retried, failed): unexpected failures in shader-uniform-packing-restrictions.html caused by a stack overflow
      • Windows 10 x64 WebRender debug:
        • M fails on:
          • gl1c (retried, failed)
          • gl2c (retried, failed)
          • gl2e3 (retried, failed)
        • M-gli fails on:
          • gl1c (retried, failed)
          • gl2c (retried, failed)
          • gl2e3 (retried, failed)
          • gl2e4 (retried, failed)
      • Linux 18.04 x64 WebRender opt and debug:
        • Image Not Showing Possible Reasons
          • The image file may be corrupted
          • The server hosting the image is unavailable
          • The image path is incorrect
          • The image format is not supported
          Learn More →
          All of these appear to be a stack overflow bug.
        • gl1e (retried, failed)
        • gl2e3 (retried, failed)
        • gl2e4 (retried, failed)

2022-12-02

  • 1: 2022-11-29#1, but rebased onto latest tip from mozilla-central.

2022-12-16

2023-01-09

Later builds

Finely tracking failures were stopped, for a while, since attention was devoted solely to the fix that this build ended up providing.

2023-01-17

2023-01-18

  • 1: First attempt to autoland. Revealed some more issues:
    • The PoolAlloc allocator makes asan angry on Linux! TODO: issue entry above?
  • 2: attempting to fix asan violations in PoolAlloc on Linux, per @jgilbert's suggestion. This failed, but nicely demonstrates failures from 1 above.
  • 3: actually try to fix things; @ErichDonGubler hadn't propagated the ANGLE_DISABLE_POOL_ALLOC compilation flag properly in 2.

Patches

Patch stacks in Lando: