ChatGPT has brought about a considerable transformation in programming and research. This article will delve into various prompts that can be immensely beneficial for a security auditor.
To ensure OpenAI doesn't utilize your conversation for model training while auditing, remember to disable the checkbox:
As you may have anticipated, GPT has its limitations when it comes to input size:
We also encountered an unexpected bug in GPT-4, where it silently disregarded the lower part of the input without any indication of an error.
For instance, when we fed a big contract into GPT-3.5 and asked it to disregard everything above and just answer "1+1=?", it impressively provided the correct answer:
However, GPT-4 simply summarized the top part of the input, completely overlooking the question:
This could lead to a negative situation where GPT-4 appears to offer analysis but actually cuts off your request, missing crucial information at the bottom.
Beware!
The most obvious way to use ChatGPT is to ask ChatGPT to discover vulnerabilities in the code.
Let's try doing this for a small staking contract [→ see code].
The contract had several issues that we would like to identify via GPT:
unstake()
when dealing with a large stake._reward
and _amount
in unstake()
.stake()
twice overwrites the user's previous stake.We presented the prompt as follows: "Find vulnerabilities" → "More" → "More":
The result:
Version | Vuln #1 | Vuln #2 | Vuln #3 | False Alarms |
---|---|---|---|---|
GPT-3.5 | - | - | - | 18 |
GPT-4 | FOUND | - | FOUND | 20 |
Both GPT-3.5 and GPT-4 raised several false alarms, with GPT-3.5 performing the worst. However, GPT-4 impressively identified two out of the three vulnerabilities! Great job!
In summary, GPT can be effectively used as a scanner for small, isolated contracts or functions, thereby enhancing the quality of audits. However, brace yourself for a considerable number of false alarms and don't expect it to uncover all vulnerabilities.
Keep in mind that applying this method to large contracts poses challenges:
At the outset of an audit, your primary mission is to grasp the inner workings of the protocol at a high level. With the aid of GPT, you possess a powerful tool to streamline complex code, clarify variable names, remove unnecessary comments or events, reduce the code by substituting certain arguments. All of this accelerates your immersion into the protocol.
Let's embark on this with a practical example – simplifying the liquidate_extended()
and _liquidate()
functions from crvUSD [→see code].
The code of these functions is fed into GPT-4, and prompts are employed one by one, watching the result:
1. Merge _liquidate() into liquidate_extended()
2. Simplify the code knowing that the argument `user` is always equal to `msg.sender`
3. Simplify the code knowing that the argument `use_eth` is always equal to `true`
4. Simplify the code knowing that the argument `frac` is always equal to `10**18`
5. Simplify the code knowing that the argument `min_x` is always equal to `0`
6. Strip types
7. Replace `xy = ...` to `stablecoin_amount, collateral_amount = ...`
8. Simplify the code by adding a `require(debt > stablecoin_amount)`
9. Simplify the code knowing that `callbacker` is always `empty(address)`
10. Replace unsafe_ by respective math ops
This is what we get in the end:
def liquidate_extended():
"""
Perform a bad self-liquidation if health is not good
"""
debt, rate_mul = self._debt(msg.sender)
stablecoin_amount, collateral_amount = AMM.withdraw(msg.sender, self._get_f_remove(10**18, 0))
require(debt > stablecoin_amount)
STABLECOIN.transferFrom(AMM.address, self, stablecoin_amount)
to_repay = debt - stablecoin_amount
self._withdraw_collateral(msg.sender, collateral_amount)
STABLECOIN.transferFrom(msg.sender, self, to_repay)
self.redeemed += debt
self.loan[msg.sender] = Loan({initial_debt: 0, rate_mul: rate_mul})
self._remove_from_list(msg.sender)
d = self._total_debt.initial_debt * rate_mul / self._total_debt.rate_mul
self._total_debt.initial_debt = max(d, debt) - debt
self._total_debt.rate_mul = rate_mul
The outcome reveals a transformed code, now remarkably more comprehensible than its original version.
Now we can scroll up our dialogue with GPT-4 and modify certain prompts. For example, we could ask it to simplify the code for the case callbacker==msg.sender
, to see a different code flow for this function.
These simplifications can be highly beneficial during the initial stages of an audit especially when you need to quickly understand the high-level workings of the protocol.
Important notes:
Harness the GPT to unlock the secrets of invariants, especially when confronted with formidable tasks like navigating through vast functions and unraveling code paths with significant ramifications.
Let's consider adjustTrove()
and _adjustTrove()
functions from Ethos (clone of LUSD) as an example [→ see code].
Suppose we seek to uncover scenarios where the fee eludes the grasp of the caller. We use the prompt:
GPT-4 accurately suggests that the fee is only levied under specific conditions: _isDebtIncrease=true && isRecoveryMode=false
.
Great!
With the ability to automatically identify logical paths with specific consequences, you can quickly and effortlessly check your suspicions about various vulnerabilities.
An auditor's prowess soars with each challenging question they pose and answer, for it enriches the tapestry of the audit's quality. Why not harness GPT's intellect to generate questions for us? We're not after mundane checklists; we crave astute inquiries that are relevant and easily verifiable.
Behold the magic template we conjured:
Imagine you are a security researcher and you are auditing [DESCRIBE THE PROJECT]. [DESCRIBE TECHNICAL DETAILS]. To find vulnerabilities in the project, you must read these functions and ask three of the most important edge-case questions about them. This will help you identify bugs or vulnerabilities. What would those three questions be? Ask very specific questions and provide suspicious arguments and code path you want to check.
Yet, heed the delicate balance! The quality of GPT's responses dances upon the prompt.
Take for instance some code from crvUSD. First, we merged several of its functions [1,2] into one [→ see code]. And then prompted:
To assess the quality of generation, we divided the resulting questions into three categories:
collateral==0
, even though there's a check in the code assert collateral * X / Y > 100
, which forbids zero values.create_loan()
can accept a collateral
value that doesn't match msg.value
, and there's a call to an unknown part of the code, _deposit_collateral(collateral, msg.value)
. And GPT wonders: does this unknown function correctly deal with these two values possibly not matching?Merging several functions, we engaged GPT-4 to weave 15 questions, and here's what we got:
Generated questions | Quantity |
---|---|
Nonsense | 1 |
Lazy | 1 |
Vague | 0 |
Good specific question | 13 |
Great result!
We've unveiled some techniques that accelerate your understanding of a protocol and elevate the grandeur of your audits. The realm of AI is a burgeoning landscape, and fear not, for mastery of this amazing tool shall lead you to extraordinary feats. So, embrace the power of GPT, and together, we shall embark on an endless journey of discovery!