DISCUSSION: Prefetching strategy

# DISCUSSION: Prefetching strategy ###### tags: qwik ## Overview Brainstorm on how Qwik should prefetch JS/CSS resources ## Prior Art - [HTTP `Link` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link) - [Link codelab](https://web.dev/codelab-two-ways-to-prefetch/) - [QuickLink](https://getquick.link/) library ## Constraints - We don't want to have the prefetching to be on the main thread because doing so would show up on web-vitals and the site would get penalized for it. - Reached out to chrome-team to see if they have a solution for this. Addy suggeested [QuickLink](https://getquick.link/) library. ## `IntersectionObserver` - Ideally it would be nice to download only the code which is in the view port. - This can be obtained by doing `querySelectorAll('[on\\:.]')` (However, this only retrieves the event handler `QRL`s. There are also internal `QRL`s which are not listed in the HTML.) **Issues To Be Solved** - `IntersectionObserver` needs a set of elements to watch. As the application runs the set changes. How do we deal with this? - We could register DOM mutation events (Seems like too much overhead) - We could periodically re-query (feels dirty) - Retrieving `QRL`s which are used as dynamic imports in code and which will not show up in the static imports ## Web-Workers Running pre-fetching on the worker is desirable because it does not block the main thread and will not show up on the web-vitals. - Web-Worker can pre-fetch URLs using `fetch` or `XMLHttpRequest`. As long as the `Cache-Control` is correctly set the request will be cached for the main thread. **Problems** - Neither `fetch` nor `XMLHttpRequest` honors [HTTP `Link` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link). This means that we need an alternative mechanism to deal with it. - Current implementation manually reads the `Link` header, but it means that it can only do so once the whole file is downloaded. (Browser could read the header as soon as it shows up and enqueue it not having to wait for full download.) - If we could use module preload than the browser could also parse the content of the file (but not execute it.) ## Push approach A push approach would be an alternative to the `IntersectionObserver` approach. Instead of trying to determine which links are available for user to interact, the server could keep track of how the users interact with the site and then come up with a sorted list of resources to download. The client would then download the content in that order. **Advantages** - Less code to ship to client as client does not have to deal with `IntersectionObserver` as well as the complexities described in `IntersectionObserver` section. - The order of downloads would be based on actual user interactions. For example `logout` may be the first link but since almost no one clicks it it should be loaded later. **Problems** - **Cost** - we’re going to generate a shit ton of data, even if we sample. there are hundreds of thousands of content entries in Builder, and right now our stack is very lean cost wise. even if we just sample data, we’d have to put somewhere and query it which will get expensive to do in real time as we serve traffic, or complex to do in batches and always out of date (esp given how rapidly ppl update content) - **Comlpexity** - we do analytics for several different things, and reusing that infra can be done here but it will def be complex. can go into how we do data + batching now, but it’s something we really try and avoid unless absolutely necessary. and our current infra is designed to not be queried rapidly and batching can solv ethis but creates other maintenance and optimization challenges, esp at the volume of content we have. right now it's a large job managing and optimizing our current analytics infra for cost, scalability, accuracy, and performance, and ideally we want to reduce operational costs over time rather than add more (at least right now) - **Efficacy** - builder content changes like crazy, by design. people pump out hundreds of pages, many of which don’t get traffic until all at once (e.g. a promo or product launches). then people rapidly update pages all the time, so it would constantly invalidate any data as the page contents are always changing, so I worry that even after the cost + complexity, this approach can frequently be out of date or miss predict leading to slow interaction times ### Collecting usage statics - Need a way to separate "pre-fetch" request from "execution" request. This can be done by adding appropriate header to the "pre-fetch" request so that it can be excluded from the statics. - The Qwik client can store requests and ship them to the server so that server can collect aggregate statistic about real world use. - Server can then push a ordered list of URLs to pre-fetch to the web-worker.