Autonolas Mech Tools observations

# Autonolas Mech Tools observations They have 5 different ways of making predictions: 1. Requesting completions from GPT, embedding summarized information from external sources into the prompt 2. Requesting completions from Claude, embedding information from external sources into the prompt 3. Requesting completions from GPT, using embeddings and more advanced RAG for the prediction prompt. 4. Requesting completions from GPT, using a Subject Matter Expert persona, generated at runtime, for the system message of the prediction request. They fetch and summarize additional info just like in the first approach, but they don't use the SME to generate the external info queries. 5. Same as the first, but they use the LLM to prompt engineer the prediction prompt iteratively. I believe all but #3 are pretty simple. A few observations: - [The LLM predicts not mainly based on the information provided from external sources, but from its training data](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_claude/prediction_request_claude.py#L55) and treats externally gathered information as "additional", which may or may not be useful to make the prediction. **This is a huge assumption of them** IMHO; that we could stress test - They instruct the LLM to [grade the utility of the information provided from external sources](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_claude/prediction_request_claude.py#L80) in order to make the prediction. And also a [confidence score for its prediction](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_claude/prediction_request_claude.py#L78): - [For the research, they use the LLM to generate internet search queries](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_claude/prediction_request_claude.py#L111), the technique I was mentioning that [GPT Researcher also uses](https://github.com/assafelovic/gpt-researcher?tab=readme-ov-file#architecture). - Interestingly, their system prompts are often just [placeholders](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request/prediction_request.py#L241) - [They use Google Search API directly](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_claude/prediction_request_claude.py#L123), no SERP or anything like that. This may (negatively) impact the quality of their web results. - They don't use the LLM to discriminate which web results to scrape, [they scrape all of them](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_claude/prediction_request_claude.py#L224). - They [summarize algorithimically](https://github.com/valory-xyz/mech/blob/main/tools/prediction_request/prediction_request.py#L307), not using vector searching or LLM summarization - They don't use function calling, just prompt engineering. - Very basic AI generated native transfer tx: https://github.com/valory-xyz/mech/blob/main/tools/native_transfer_request/native_transfer_request.py Overall I believe the research process is very basic. I don't see them benchmarking against anything concrete so I really doubt that these are the best approaches or the ones that'd yield the best results. I think there's **plenty** of room for improvement. We could even improve the way the calculations of prediction scores work; using more advanced math functions and whatnot There are still a few unknowns, I don't know what are the system constraints in terms of time and cost for each prediction request.