Chomp Ideas Feedback

# Chomp Ideas Feedback Feedback on https://gist.github.com/wycats/159ca6232d12c2900f102e34c113a8f3. ## Extension Improvements ### Source File Location + Extensions Importing Other Extensions * `ENV.CHOMP_EXTFILE` Seems sensible! * `ENV.CHOMP_EXTDIR` Does this mean that `addExtension` would be relative to this path? That sounds good to me. > That seems sensible to me. I was originally looking for the minimal change I could make to do what I needed manually, but this seems better. > [name=@wycats] Would it also work as a top-level for resolving bare extension names ie when invoking the main Chomp task? > As long as it didn't become synonymous with `cwd` (i.e. extensions and child Chompfiles would default to using their dir as `CHOMP_EXTDIR`, I think that makes sense). > [name=@wycats] > I guess I'm thinking something like a rudimentary module resolver, where this is the directory the "bare specifiers" get looked up, but relative paths etc are all still against the CWD. > [name=@guybedford] #### Exporting Extensions Is the use case here eg being able to define private inner templates without conflict? Would having template names scoped to their parent extension help for encapsulation? And template reexporting / renaming? Or does this miss useful extension features that need to be exported explicitly? > The primary use-case I had in this section was an extension being able to use another extension in its implementation without exposing that fact in some way to the Chompfile that used the template. > > The "exporting extensions" section was meant to distinguish between abstracting internal use of an extension (which I think makes a lot of sense and is a gap in the current design), and having an extension give you other extensions to import. An example of that use-case would be an extension that composes a bunch of external and internal extensions for ergonomics, but wants to expose all of its components for advanced use. > > I think this is much less important than the ability for an extension to use extensions in its implementation without consequences for the Chompfiles (or other extensions) that create tasks with the template. [name=@wycats] > Ideally extensions themselves would use ES modules instead of scripts, allowing the ability to eg share functions and state between extensions. Would something like that cover the use cases missed by treating this purely as template name encapsulation? [name=@guybedford] ## Workspace Improvements First-class workspace support would be great. > This was probably my original motivation for writing up my thoughts. I discovered that the current recommendation doesn't compose `watch` correctly, and I chose `chomp` in the first place to have a single root script that (a) compiled my packages, (b) in a cache-aware way, (c) that fed into a `watch` system. > > For compiling packages in a workspace, it basically works fine to treat the entire workspace as a single package (as long as the target files go into directories that are relative to the package root). > > The only issue I had with ergonomically doing *that* was the limitation on interpolations (I needed `packages/#src/##.ts` -> `packages/#src/dist/##.js` or something like that). > > I currently enumerate each package in the root Chompfile and wrote a local extension to keep the per-package config short. [name=wycats] > Supporting extended target syntax seems like a great place to start to me. I agree we need to make this more flexible, and am open to extending the syntax in this fashion. [name=guybedford] I wonder if even instead of an explicit import or new semantic, we could somehow automatically pick up all `chompfile.toml`'s and work out how to do the right thing, by creating a super graph over the individually instanced task graphs. > Something like this makes sense to me, and I like it as a goal. That said, I think that it would probably want to be built on something more like the manual primitive anyway, so I think it makes sense to make it possible with manual references, and then specify the automatic inference in terms of manual import. Does that make sense? [name=@wycats] > Interesting, approaching graph combinations as reinterpretations at the syntax level based on well-defined rebase rules sounds like it might well abstract the problem, although I can't fully verify all the edge cases right now. Treating it as something to reexamine later as an option towards that workspace goal sounds good to me. [name=@guybedford] This may well be harder than your proposal as it involves effectively reinstancing the right parts in core, but could be very nice if it could work out. > I wonder if the layered design I suggested above would reduce the amount of necessary changes. I haven't read the source code carefully yet (beyond cases where I needed to read it to understand something), so I'm just guessing atm. [name=@wycats] > You may well be right [name=@guybedford] Something like defining a workspace in the chompfile itself: ```toml [workspace] workspace = ["packages/*"] ``` Then loading up all the graphs, and then being able to run operations against specific packages, the main package, or multiple packages in parallel, with specific workspace CLI commands. Within each task graph, templates, extensions, etc would all be uniquely instanced to ensure full isolation. Super task batching would be its own consideration of course, possibly as sharding or weighting of the total pool count. Paths would be delegated to their individual task graphs based on path ownership via path scoping (eg in watching / cross-dependencies / etc). > I like this design a lot (I wonder if we could piggy-back on yarn and/or pnpm workspaces somehow). I think both designs rely on some kind of scoping primitive (namespacing is the least intrusive addition, but it's not the only option), so we may as well dig into that design question first either way. [name=@wycats] > Yes, so to argue the other side, there are a bunch of local assumptions being made in the current operation (eg what if template names clash between two chomp environments / versions etc). And those can be avoided by treating it as a full reinstantiation of the entire Runner, where you effectively then have multiple Runners. Ensuring pooling, file ops, and watching is properly managed then becomes the problem there rather. I'm open to both approaches, and certainly perhaps something to work out later. [name=@guybedford] Happy to discuss your specific proposal further, but thought I'd throw this one out first. ## Execution Improvements ### Clean Very cool idea, never thought of this, but yeah we should have all the information to implement this. Would love to see that. > This could also be an extension point for third-party > extensions. For example, the `targets` returned by a registered > task could be a function that takes a dep filename and returns > a target filename. > > This may be a good thing to support regardless of whether the > `clean` proposal is accepted. Can you go into some more detail on this, and why it would be needed? Is it for files that exist outside of the task graph? Or because the extension has a better concept of intermediate files that despite existing in the graph can be removed? > I found that when shelling out to third-party libraries, it was frequently easy enough to characterize the target file, but pretty annoying to require at the task level. Doing it at the task level implies that you can control the output location, and sometimes you just can't (or for one-off extensions, don't want to have to support teaching a third-party tool about configurable outputs). > > Also, I found it frequently the case that a single dep emitted multiple outputs (`.js` and `.js.map` or `.js`, `.js.map`, `.d.ts`). For caching purposes, it's normally sufficient to identify a single output that's strictly more likely to change than any other file. > > However, it's not always possible. For example, a compilation from `.ts` to `.js` and `.d.ts` may update the `.js` file, the `.d.ts` file or both. In this situation, you need a 1:N mapping for caching to work correctly. > > For **clean**, you could keep a manifest of all generated target files on each run (and update it incrementally), and that would be sufficient. However, you might want the ability to say something like "clean the targets of this source file", and for *that*, you need a complete mapping from source files to target files. > > If your task shells out to a third-party tool to handle multiple sources at once, it may be impossible to infer the answer. [name=@wycats] > Thanks, as part of extending the internal task system, I agree multi-target is useful, and something that should be added. [name=@guybedford] > Another standard clean technique is just to specially define globs that cover generations adequately. Something along the lines of: ```toml= [task] name = "clean" deps = ["dist/**/*", "obj/*.a"] trace-deps = false run = rm $DEPS ``` > Where a `trace_deps = false` disables generation of deps through the task graph expansions so you don't build things and then clean. > Discussing verbose output below. > [name=@guybedford] ### Verbose Operations I wonder if this could be achieved with some kind of logging pattern instead if extensions carefully followed the conventions? > That sounds like a great idea. Adding logging features to the global exposed to extensions seems aligned with the current design. > > I mentioned "dry run" in the original gist because I think that we would probably want the logging pattern to support the "dry run" pattern, where you tag a block of code with a log entry, and you can emit the log entry without running the block of code. > > For dry runs, you need the ability to emit verbose logs even if you *wouldn't* under normal execution (because the log is the only thing that happens), and I think it goes beyond just another log level. [name=@wycats] > Ok, so rather than a logging system, it is more like a complete "annotation" system of work being done, then in theory dry runs would work out. But only so far as the system was followed. The benefit of following the pattern would of course be dry run support, clean support, really nice logging support with colouring etc. So there should be enough of an incentive for extensions. Then you would want this information output to a file for clean support I think, so you can reuse the previous record, which then makes me wonder if something like this isn't maybe more something we should design as being an extension itself that is possible to plug in? Would be great to look at a concrete proposal certainly. [name=@guybedford] ### Uncached status Seems I missed this one, more display options are always welcome. I'm happy to explore the configuration for things that don't introduce a huge amount of architectural overhead. ## Template Improvements ### File Parameters Would the template somehow define the option type as being a file here? Would this file itself participate in the task graph? If it did, what would it invalidate? All tasks of the template? > The template itself wouldn't need to know that it was accepting a file. It would take a normal string, and the Chompfile runner would read the file and pass it to the template. The use-case for this is wanting to pass the contents of an external file to a template that takes something like a `tsconfig` string as a parameter. > > I think that the file would become a dependency of the task that passed it as a parameter, and changes to the file would invalidate the task. [name=@wycats] > Is `package` or `file` a special name here that does the "file" marking? I'm just still unclear on how this information is being obtained. > Relatedly - I'm entirely open to rewriting the entire extension model as it's still highly experimental, and having higher level discussions around how it should work. I do have residual interest in doing some heavy lifting on this model as it's probably the part I'm most interested in working on further for the project. [name=@guybedford] ## Path Improvements ### Mapping a Single Dep to Multiple Targets If we can do this somehow consistently, I'm open to it. > I think that 1:N mappings fits cleanly into the current model. Whenever the source file changes, all of the target files are invalidate. That said, implementation details might complicate things. [name=@wycats] > Sounds good. [name=@guybedford] ### Named interpolations This is actually a really nice concept. Perhaps using `[name]` syntax instead? eg - ```toml dep = "packages/#[scope]/#[pkg]/src/##.ts" target = "packages/#[scope]/#[pkg]/dist/##.ts" ``` > Works for me! At first glance, I wasn't sure if multiple `##`s pose additional complexity or make things hopelessly ambiguous, and the use-cases I had for this feature didn't require them, so I think "one ##, multiple #"s is a good scope for this feature. Any thoughts on whether multiple `##`s make sense and what questions they might raise? [name=@wycats] > I guess `##` could be supported just as well here, although perhaps we should just restrict to only one `##` in a given path or something like that? The nice thing about the current single instance of either `#` or `##` is that it's very easy to take a path and work out which interpolate it comes from and with what substituation, but it's just working through the rules for the most part I think. Another constraint that we could even consider adding is that if two interpolation patterns both match a path for a target interpolation, that we automatically throw as that's clearly an ill-formed mapping case. Perhaps thats the main constraint to maintain? [name=@guybedford] ### Interpolation scoping If we used the syntax I suggested above, can that satisfy this same use case? ```toml dep = "packages/@#[scope]/#[pkg]/src/##.ts" = target = "packages/@#[scope]/#[pkg]/dist/##.ts" ``` > The main goal is to allow restricting the glob (in this case, to directories that start with `@`, but the restriction could really be anything), but still have a 1:1 mapping into the target. This example works because you can externalize the `@` and then repeat it in the target. Since the scope is just removing some directories from the dep list, I thought that you should be able to express it as a pattern and not need to find a way to repeat it in both the dep and target. > > Perhaps I'm thinking about it wrong. [name=@wycats] > If the above works, then supporting `#[name]` as having prefixes or suffixes before or after that `/` seems fine to me. `##` could still avoid these. At least it gives an argument for supporting that though. [name=@guybedford] ### Unambiguous Path Prefix If we had something like the workspaces proposal above, then this seems like it handles some of those things. This does make me wonder how to run tasks in a subpackage of a workspace that wants to share tasks with the parent workspace though. Perhaps one could define sub-package as belonging to the parent to bring that workspace graph into play: ```toml workspace = "../" ``` ie `workspace` exactly acts as a supergraph inclusion process, creating a new task graph for each sub-workspace. Because this is done declaratively upfront and recursively, we immediately have the ability to delegate ownership of paths to their appropriate chompfile scopes. Chompfile relative over cwd relative as the model then? If I'm missing details you've captured though we should discuss this further. ## Chompfile Structure Improvements ### Task Name in Table as alternative to `[[]]` ```toml [tasks."dts:mdit"] template = "dts" template-options = { root = "packages/@jsergo/mdit" } ``` All for it. > 🙌 [name=wycats] ### Nest Template Config ```toml [[task]] name = "dts:mdit" template.dts = { root = "packages/@jsergo/mdit" } ``` Again, seems a nice simplification. > 🙌 [name=wycats] ## Model Improvements ### `invalidation: "digest"` I wonder if it might not make sense to do this by default whenever a task is run that will overwrite an existing file? Effectively as a special type of mtime exception - where we take the mtime, rerun the task, and if the digests match, we keep the old mtime instead of the new file system mtime. We could possibly even update the mtime on the file system itself too effectively as an automatic protection against destructive writes. Agreed it might make sense to toggle this, but the point is it seems like it might just be an "optimization" of the mtime invalidation as opposed to a new one. ## More Builtins - Remove files and directories Do you mean as a template? ## Node Engine Improvements - Document the `node` engine environment - Access to glob Open to making the Node.js engine maximally useful. That said I believe it does run in the context of the current Chompfile so in theory one could just have devDependencies for this kind of thing.