# Why solver failing easily ## How to trigger the fail of the service Starting with the repository [solver-service](https://github.com/moyodiallo/solver-service/tree/solver-could-fail-easily) in which we add an example `examples/main_v2.ml` that send a wrong request to the `solver-service`. To run to `solver-service`, you could refer to the `README` file of `solver-service`, there's a copy paste from solver-service repo in the next subsection to make it easy. ----------- ##### A solver service The `./examples` directory contains a small CLI tool for testing the solver service over a TCP connection. It requires a package name and version to solve for (note it will use `opam` to fetch the opam file information). To test it, you first must install and run the solver service. It spawns workers by recursively calling itself (using `Sys.argv.(0)`) so it is important to run it using its proper name rather than with `dune exec --`. ```sh $ solver-service --address=tcp:127.0.0.1:7000 Solver service running at: <capnp-address> ``` Copy the `<capnp-address>` and run the example binary passing in the address. ```sh $ dune exec -- ./examples/main.exe --package=yaml --version=3.0.0 <capnp-address> ``` -------------------- The correct request could be sent by the default one `dune exec -- ./examples/main.exe .....`, with this one you will see the solved dependencies. After the wrong request sent by doing `dune exec -- ./examples/main_v2.exe .....`, all the request sent after that, are going to be pending(waiting with no answer). What is sent is just the wrong url of `opam-repository` ([see this comment](https://github.com/moyodiallo/solver-service/blob/dad7eca364dc9ed91498a2c9b7a178a8b63b38c9/examples/main_v2.ml#L72-L82)). ## Conception of the solver-service and issue The solver-service is in control of creating new `Epoch`(group of process). Each time its receive a request, a new `Epoch` is created if the opam-repositoriy commits is different to the previous one(when all the process end). if the opam-repository commits is the same, the request is distributed in the current `Epoch`. An `Epoch` is pool of process in which the service distribute the requests. A request is sent to a process in the pool via a channel and wait for the response via a channel too. ~~[exchange via channel](https://github.com/ocurrent/solver-service/blob/4517ee45a4d7090109d22fdebf0de60449978e44/src/solver-service/lib/service.ml#L77-L79) When a process is waiting for response via channel and it's not ending (like the process of the pool fails), if there's another request with different opam-repository, that request will never be proceed. All processing request in the current `Epoch` should finish in order to create new one.~~ The fail described above, is when the solver creating the first `Epoch`. The `Epoch_lock` is in `Activating` state when runnig a function which is supposed to clone `opam-repository`(with wrong url) and fail. With any other request, the solver will try to drain the old(`Draining`) and create a new `Epoch` but the old `Epoch` already blocked to `Activating` state. ## A solution It's about catching all error during the `Activating` state. If there's one resolve that situation without been blocked in that state.