# DABstep evaluation Before fixing `open` issue in the agent: | Model | Easy accuracy | | -------- | -------- | | Qwen/Qwen2.5-32B-Instruct | 62.50% | | Qwen/Qwen2.5-32B-Instruct-AWQ | 59.72% | | Qwen2.5_32B_sft_csot_alpaca_32b_a63_m50_code_v7_awq | 9.72% \* | \* ran on older version of library After fixing `open` using 14 steps, better prompt, gredy encoding and 30k context length: | Model | Easy accuracy | path | | -------- | -------- | - | | Qwen/Qwen2.5-32B-Instruct | 75.00% | `/nas/projects/llm/models/msokolsk/dabstep.b.op.1l2` | Qwen/Qwen2.5-Coder-32B-Instruct | 58.33%\* | `/nas/projects/llm/models/msokolsk/dabstep.b.32c.1` | Qwen/Qwen2.5-32B-Instruct-AWQ | 69.44% | `/nas/projects/llm/models/msokolsk/dabstep.b.op.1awq` | Qwen/Qwen2.5-14B-Instruct | 61.11% | `/nas/projects/llm/models/msokolsk/dabstep.b.op.2` | Qwen/Qwen2.5-Coder-14B-Instruct | 25.00% | `/nas/projects/llm/models/msokolsk/dabstep.b.op.3` | Qwen2.5_32B_sft_csot_alpaca_32b_a63_m50_code_v7_awq | 15.28%\* | `/nas/projects/llm/models/msokolsk/dabstep.b.op.4.us2` | sft_csot_alpaca_32b_a63_m50_code_kd_v1 | 0.00%\* | `/nas/projects/llm/models/msokolsk/dabstep.32pk.1` \* Ran with 10k context length # LATEST RESULTS After fixing prompt (payments.csv), greedy encoding, context length 10k | Model | Easy accuracy | path | | -------- | -------- | - | | Qwen/Qwen2.5-32B-Instruct | 73.61% | `/nas/projects/llm/models/msokolsk/dabstep.32.2` | Qwen/Qwen2.5-32B-Instruct-AWQ | 66.67% | `/nas/projects/llm/models/msokolsk/dabstep.32a.2` | Qwen/Qwen2.5-Coder-32B-Instruct | 72.22% | `/nas/projects/llm/models/msokolsk/dabstep.b.32c.2` | Qwen/Qwen2.5-14B-Instruct | 66.67% | `/nas/projects/llm/models/msokolsk/dabstep.14b` | Qwen/Qwen2.5-14B-Instruct-AWQ | 62.50% | `/nas/projects/llm/models/msokolsk/dabstep.14b.a` | Qwen/Qwen2.5-Coder-14B-Instruct | 59.72% | `/nas/projects/llm/models/msokolsk/dabstep.13c.2` | Qwen2.5_32B_sft_csot_alpaca_32b_a63_m50_code_v7_awq | 41.67% | `/nas/projects/llm/models/msokolsk/dabstep.us.2` | sft_csot_alpaca_32b_a63_m50_code_kd_v1 | 0.00% | `/nas/projects/llm/models/msokolsk/dabstep.32pk.2`
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up