# JS Fetcher
`JSFetcher` is a Python-based tool designed to fetch, save, and analyze JavaScript files from a list of provided URLs. It supports features such as proxy usage, custom headers, URL extraction, JavaScript beautification, and optional retrieval of source maps.
## Features
:::success
- **JavaScript Beautification**: Beautify fetched JavaScript files for easier analysis.
- **URL Extraction**: Extract additional `.js` or `.chunk.js` URLs from fetched content and process them recursively.
- **Source Map Retrieval**: Optionally fetch mapping files (`.js.map` and `.map`) referenced in JavaScript files and by adding .map to the end of JavaScript filename.
- **Output Directory**: Save fetched content to a specified directory, preserving the directory structure.
- **Multithreading**: Fetch multiple URLs concurrently using a specified number of threads.
- **Custom Headers**: Add custom headers to requests for authentication or other purposes.
- **Proxy Support**: Use a proxy for requests with validation checks.
- **Retry Mechanism**: Retry failed requests with an exponential backoff.
:::
## Requirements
- Python 3.7+
## Download
The latest version of `JS Fetcher` can be downloaded at the following URL:
```bash
wget https://static.k0.lc/share/js_fetcher.py
```
## Installation & run
The following libraries are required for this project:
```bash
pip3 install docopt jsbeautifier coloredlogs verboselogs tldextract==3.2.0
python3 js_fetcher.py -h
```
## Usage
```html
Usage:
js_fetcher.py [--scope <scope>] [--mapping-search] [--outdir <directory>] [--proxy <proxy>] [(-H <header>)...]
[--retry <num_retries>] [--threads <num_threads>] [--timeout <seconds>] [--follow-redirect]
[--only-request-js-files] [--only-save-js-files | --only-save-orig-js-files] [--no-save-embedded-js]
[--response-min-size <size>] [--disable-js-beautify] [--disable-js-embedded] [--disable-url-search]
[--disable-proxy-check] [--disable-recursion] [--disable-cpath-filter] [--headers-from-file <file>]
[--keep-minified-content] [-v | -d | -dd] -u <URL>
Program options:
-u,--url <URL> URL(s) to fetch, could be a filename, a string a comma separated string list or a list
-s,--scope <scope> Set domain(s)/subdomain(s) as a valid scope instead of root_url. Multi-format (like --url)
-m,--mapping-search Attempt to recover '.js.map' files for all '.js' files found
-o,--outdir <directory> Directory to save the responses
-x,--proxy <proxy> Proxy to use for requests
-H,--headers <header>... Custom headers for the requests
-r,--retry <num_retries> Number of retries on failure [default: 1]
-t,--threads <num_threads> Number of threads to use [default: 5]
-T,--timeout <seconds> Timeout for each request in seconds [default: 8]
Filter options:
--only-request-js-files Input Filter: Clean '--url' argument to keep and only request '.js' files
--only-save-js-files Output filter: Save only beautified '.js' and '.js.map' files in output directory
--only-save-orig-js-files Output filter: Save only original '.js' and '.js.map' files in output directory
--no-save-embedded-js Output filter: No save in separated file embedded JavaScript code found in script tag
--response-min-size <size> Output filter: Skip all URLs whose response size is shorter than this value [default: 50]
Misc options
--follow-redirect Follow redirection when request responds an HTTP/3xx redirection
--disable-cpath-filter Disables consecutive paths detection and filter mechanism (ex: '../assets/assets/..')
--disable-js-beautify Disables the JavaScript beautification mechanism applied to '.js' files
--disable-js-embedded Disables the embedded JavaScript code recovery mechanism applied to non- '.js' files
--disable-url-search Disables the '.js' and '.chunk.js' recovering mechanism applied to fetched URLs
--disable-proxy-check Disables proxy check mechanism when proxy is present
--disable-recursion Disables recursion, the program stops when the first-level URLs are retrieved
--headers-from-file <file> Get raw request headers from file
--keep-minified-content Keep a copy of original minified content (.minified.js) for further unmapping with '.map'
General options:
-v, --verbose Enable verbose mode
-d, --debug Show more details on what the program does under the hood
-dd, --debug Print Debug level 2 (with all classes debug_class output)
-V, --version Show version info
```
## More about supported arguments
### Arguments parsing
`JS Fetcher` allows to define some arguments in many ways:
- `-u,--url`, `-s,--scope` arguments can be a filename, a string, a comma-separated string list or a list (when `JS Fetcher` is used as a library);
- `-H,--headers` could be defined multiple times (like `curl`);
- `stdin` (with `-`) is supported for all these arguments.
For example, if you want to define several target urls (`-u,--url`), all the following commands produce the same result:
```c
js_fetcher -u /path/urls
js_fetcher -u http://www.example.com/app/
js_fetcher -u "http://target.tld.com/app/index, https://target.tld.com/file.js"
cat /path/urls | js_fetcher -u -
echo 'https://target.tld.com/file.js' | js_fetcher -u -
```
### Scope
By default, if the tool finds a complete URL that doesn't match the root URL, it marks it as out-of-scope (`OOS`).
For example, if all the `.js` files on the home page of **www.target.com** are linked to **assets.target.com**, these files will be rejected by default unless the argument `-s *.target.com` is passed to the program.
### Filters
It is also possible to filter the program's inputs/outputs:
+ **Input Filter** `--only-request-js-files`: Clean `--url` argument to keep and only request `.js` files;
+ **Output filter** `--only-save-js-files`: Save only `.js` files (beautified by default) and `.js.map` files in output directory.
+ **Output filter** `-only-save-min-js-files`: Save only original `.js` files (not beautified) and `.js.map` files in output directory.
### Internal JS beautify mechanism
By default, all fetched `.js` files are *beautified*, to make it easier and more efficient for the program to search for patterns in minified files.
The tool uses the python version of the [https://github.com/beautifier/js-beautify](https://github.com/beautifier/js-beautify) library, configured with the following options:
```json
{
"indent_size": 4,
"indent_char": " ",
"max_preserve_newlines": 2,
"preserve_newlines": True,
"keep_array_indentation": True,
"break_chained_methods": True,
"indent_scripts": "normal",
"brace_style": "collapse",
"space_before_conditional": False,
"unescape_strings": True,
"jslint_happy": True,
"end_with_newline": True,
"wrap_line_length": 200,
"indent_inner_html": True,
"comma_first": False,
"e4x": False,
"indent_empty_lines": False
}
```
If the main goal is to simply beautify JavaScript code, these parameters are adjustable in the source code, in the method `JSFetcher.get_beautifier_config()`, as well as in the 2 global variables of the same class `BEAUTIFIER_INDENT_SIZE` and `BEAUTIFIER_WRAP_LINE_LENGTH`.
:::warning
If you modify this configuration or disable beautification mechanism with the `--disable-js-beautify` option, the program may no longer find additional URLs in `.js` files.
:::
These parameters have been initially obtained (and can be tested) in the online version of this library tool [https://beautifier.io/](https://beautifier.io/)
## Practical examples
### Retrieved all JavaScript code of target(s)
Classic use of this tool. Takes a list of URLs and tries to recover as much JavaScript code as possible, saving each recovered file. A beautification operation is automatically applied to each valid javascript file.
```c
js_fetcher -u /tmp/urls.txt -s "*.target1.fr, *.target2.com" -r 2 -t 8 -T 10 -o /tmp/target-jscode/ --only-save-js-files --follow-redirect -d
```
### Just beautify JavaScript code of target(s)
If the goal is to simply beautify and save all the javascript files in a list of URLs, you can use the filters to keep only the javascript code as input/output and disable (or not) the search function of additional .js files.
```c
js_fetcher -u /tmp/urls.txt --only-request-js-files -t 8 -o /tmp/beautified-jscode/ --only-save-js-files --disable-url-search -v
```
### Get mapping file of any fetched JavaScript URLs
With the `-m,--mapping-search` option, the tool also includes a search feature of the mapping file (`.js.map`) associated with any fetched `.js` URLs.
```c
js_fetcher -u /tmp/urls.txt --only-request-js-files -t 8 -o /tmp/jscode-with-mapping/ -m -v
```
### Just replay a list of URLs through a proxy server (like Burp)
To replay a list of URLs through a proxy server, there's no need to waste time beautifying javascript code or searching for patterns to discover other URLs:
```c
js_fetcher -u /tmp/urls.txt -r 2 -t 8 -T 10 --follow-redirect -disable-js-beautify --disable-url-search -X http://127.0.0.1:8080
```
## Changelog
### Version 2.5
:::spoiler
Version improvements:
- Add `try_webpack_without_key` mode (global variable, True by default) to support a special case seen in ``Nuxt.js``.
Example: `"js/" + { 0: "6aaec45", ... } [e] + ".js"` => `js/6aaec45.js`
:::
### Version 2.4
:::spoiler
Version improvements:
- Improve `REGEX_JS_URL_3` and `REGEX_END_BY_JS_OR_MAP` regexes to support mapping files ending by `.map` (instead of `.js.map`).
:::
### Version 2.3
:::spoiler
Version improvements:
- Fixed a bug allowing urls running on a port other than 80 or 443 (ex: `http://app.domain.com:3000/`) to be processed;
- Improve `REGEX_WEBPACK_2` regex to support a new loader format without a prefix folder and using a `.` (instead of `-`) as separator (encountered on an *AngularJS* frontend).
:::
### Version 2.2
:::spoiler
Version improvements:
- Add new `REGEX_WEBPACK_3` regex to support a special webpack loader (seen on *boutique.orange.fr*);
- Improve `REGEX_JS_URL_2` regex to support `.js` URLs containing a cache busting suffix (ex: `/assets/file.js?v=2`);
- Improve `REGEX_SCRIPT` regex and normalize *fake* `.embedded.js` URL(s).
:::
### Version 2.1
:::spoiler
Version improvements:
- Important bug fix in `UTF-8` string encoding, the default `strict` mode "lost" the content of part of the files instead of raising an exception. All encode/decode now set to `errors="replace"`;
- Added *fake urls* `.embedded.js` for files containing embedded JavaScript code to the results displayed by the tool on `STDOUT` and in log file.
:::
### Version 2.0
:::spoiler
Version improvements:
- Add a `references` key to fetched `.map` files containing the URL of the JavaScript file(s) linked to this mapping file:

This reference will be used later by `js_unmap` when unmapping the `index.js` file to find out where it came from:

- Added a new `--headers-from-file <file>` option to retrieve all headers present in a raw request pasted into a file;
- Improve some details in logging: Program shows more info in debug mode `-d` and `OOS` URLs now issue a warning and are displayed in all display modes;
- Add class comparison and hashing functions and docstrings harmonization.
:::
### Version 1.9
:::spoiler
Version improvements:
- Add support for files embedding JavaScript code (in `<script>` tags). Very useful when using the home page or `.html` files as a starting point. This behavior can be disabled with a new `--disable-js-embedded` option;
- Javascript source code embedded in `<script>` tags (in files not ending with `.js`) is saved by default in a file with the extension `embedded.js`. If you don't want to save this code, you can use the new `--no-save-embedded-js` option:

- Add mapping coverage statistics, with two types of coverage percentages for `.map` files in the results URL list:

- Exact match coverage: percentage of files that have their exact corresponding mapping file;
- Global coverage: percentage based on the total number of `.map` files versus other files.
- Internal doc, minor code refactoring and typo fix.
:::
### Version 1.8
:::spoiler
Version improvements:
- Major bug fix in `REGEX_JS_URL_3` regex;
- Get additional `.js` from `file` key value in source map files if present;
- Get additional `.js` file by removing `.map` in `.js.map` urls;
- Remove useless filter in `URLExtractor.extract_*`;
- Improve some details in debug_class logging (`-dd`).
:::
### Version 1.7
:::spoiler
Version improvements:
- Add new `--only-save-min-js-files` misc option to save the original minified content instead of his beautified version for further unmapping with related `js.map` file;
- Small refactoring of `fetch_url()` method to deal with the new options;
- Parse discovered `.js.map` files to keep only valid `JSON` source map files;
- Format / pretty print content of discovered `.js.map` files before saving;
- Various minor bugs, internal doc and typo fix.
:::
### Version 1.6
:::spoiler
Version improvements:
- Add new `--keep-minified-content` misc option to keep, in addition to beautified `.js` file, a copy of original minified content (`.minified.js`) for further unmapping with related `js.map` file;
- Improve `REGEX_JS_URL_2` to avoid matching `.json` in addition of of `.js`;
- Improve `REGEX_JS_URL_3` regex to detect (sourceMappingURL|sourceURL), according to official documentation: https://tc39.es/source-map/#linking-generated-code;
- Add an addtionnal security filter to skip invalid `.js` or `.js.map` matches from general regex;
- Add python `set` support for library mode;
- Improve exception handling in proxy check method (now in Tools class);
- Various minor bugs, internal doc and typo fix.
:::
### Version 1.5
:::spoiler
Version improvements:
- Add `nextJS` support;
- Add embedded mapping files support; (`sourceMappingURL=data...base64,xxx`);
- Add consecutive paths detection (ex: `x/v1/assets/v1/assets/x.js` => `x/v1/assets/x.js`);
- Add 2 new misc options:
+ `--disable-recursion`: Disables recursion, the program stops when the first-level URLs are retrieved;
+ `--disable-cpath-filter`: Disables consecutive paths detection and filter mechanism. Useful for debug program;
- Avoid to beautify fake `HTTP/200` html error pages content;
- Various bugs & typo fix.
:::
### Version 1.1
:::spoiler
Version improvements:
- Improve relative paths (`../xxx`) management;
- Get URLs for `.js.map` files that differ from the `.js` filename;
- Various bugs & typo fix.
:::
### Version 1.0
Initial release.