or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Planting trees on-chain
Lingering Github issues give me heart palpitations, particularly those that have been open for months on end. Sitting like mildew in an otherwise pristine home.
Here's one we've had open since January of this year:
EZKL
(for those not in the know), is a library for converting common computational graphs, in the (quasi)-universal.onnx
format, into zero knowledge (ZK) circuits. This allows, for example, for:Though our library has improved in its scope of supported models, including transformer-based models (see here for a writeup), GANs, and LSTMs; implementing Kaggle crushing models like random forests and gradient boosting trees has been challenging.
Part of the issue stems from the way
sklearn
,xgboost
, andlightgbm
models are exported to.onnx
. Instead of decomposing tree based models into individual operations, like matrix multiplication, addition, and comparisons, the whole model is exported as a single node (see image above) !Given our library's focus on modularity and composability this has been a bit of an anti-pattern, a proverbial thorn in our side.
This weekend after yet another call with users asking for a timeline for the implementation of such models we decided to roll up our sleeves and get it done in 48 hours. Check out a colab example here.
Here's what it took.
Decomposing ugly single node onnx models into individual operations
As noted above, having single node
onnx
graphs is an anti-pattern, something that might destroy our library's clean architecture if we try and accomodate it. A much better approach would be to instead convert the single node graph into its individual operations. Luckily we are not the only folks in history to have been keen to do this. And we landed on the beautifully coded sk2torch library which takes a graph of this form:And turns it into something like this:
So much nicer !
For more complex models like random forests we can simply extract the individual trees / estimators, run them through
sk2torch
and recreate the forest as a pytorch module.For
xgboost
andlightgbm
we leveraged hummingbird, a Microsoft library for converting xgboost into torch / tensor graphs. A converted XGBoost classifier looks like this when exported to onnx:An observant reader will note that some operations, like
ArgMax
orGather
don't have particularly obvious implementations in zero-knowledge circuits. This was the second leg of our sprint.Dynamic / private indexing of values in ZK
In python a simple and innocent indexing operation over a one-dimensional tensor \(x\),
z = x[m]
is trivial. But in ZK-circuits how do we enforce this sort of indexing? especially when the indices like (m
) might be private (advice in plonk parlance) values?The first argument we constructed was one which allows us to implement zk-circuit equivalents of the
Gather
operation. Which essentially just indexes a tensorx
at a given set of indices. To allow for these indices to be advice values we need to construct a new kind of argument for indexing over vectors / tensors in a zk-circuit.an argument for private indexing
equals
argument (see appendix below for a full description of this argument) to generate the following constraint:Altogether this set of arguments and constraints allow us to constrain the claimed output \(z\) to be the \(m^{th}\) element of \(x\).
The construction of argmax and argmin is very similar to the private indexing argument (and in fact leverages it). We add one additional constaint which is that, for a claimed \(m = \text{argmax}(x)\), we should have \(x[m] = \text{max}(x)\).
an argument for private argmax / argmin
Say we want to calculate \(m = \text{argmax}(x)\), where x is of length \(N\).
For argmin you only need to replace the above
max
operations withmin
:)Conclusion
You can try out colab notebooks for the new tree based models at:
All these models (when properly calibrated using ezkl) output predictions that are less than \(0.1\%\) away from the original
sklearn
,xgboost
, andlightgbm
predictions.References
Appendix
equals argument
max argument
ReLU
element-wise operation.sum argument
Consider the following plonk columns:
The sum between vectors \(a\) and \(b\) is then enforced using the following constraints \(\forall i\): \(a_i + b_i + m_{i-1} = m_i\)