owned this note
owned this note
Published
Linked with GitHub
# Diving Deeper into Blaze: NTT module
In our previous [blog post](https://medium.com/@ingonyama/introducing-blaze-zk-acceleration-for-fpga-6f5f7cc50e1f) we described [Blaze](https://github.com/ingonyama-zk/blaze) - a Rust library for ZK acceleration on Xilinx FPGAs.
Since the release of Blaze, we have been actively working on its architecture and applying an API of our NTT primitives implementation. Today we are ready to introduce a new module for working with NTT.
## What is the NTT module in nutshell?
Blaze architecture makes it easy to add new modules. In our introductory [Blaze blog post](https://medium.com/@ingonyama/introducing-blaze-zk-acceleration-for-fpga-6f5f7cc50e1f) we described the Poseidon hash function, and here we will describe the NTT module.
NTT, or Number Theoretic Transform, is the term used to describe a Discrete Fourier Transform (DFT) over finite fields. Our module provides an API to the calculation of NTT of size 2^27. To use it, the input byte vector of elements must be specified. Each element in the input vector must be represented in little-endian. The result will be a similar byte vector in which each element is represented as little-endian bytes.
## How is NTT Structured from a Developer’s Point of View?
In this brief blogpost we will not dive in depth about how and why the calculations are built. You can read about this in a series of posts previously published:
1. [NTT 201 -Foundations of NTT Hardware Design](https://https://medium.com/@ingonyama/ntt-201-foundations-of-ntt-hardware-design-fb8a4491d99f)
2. [Foundations of NTT Hardware Design, Chapter 2: NTT in Practice](https://medium.com/@ingonyama/foundations-of-ntt-hardware-design-chapter-2-ntt-in-practice-a812df80a033)
More information on this subject will be released as part of our upcoming NTT Webinar.
The important thing for us is that on-device memory (in our case we’re working with HBM) is divided into two buffers:
1. In one of the buffers the host writes data - our input vector
2. In the other buffer a computation is taking place, and then they swap places.
The main advantage of this design (especially having two buffers) is that it supports NTT back 2 back computing.
So our calculation involves the following steps:
1. Host writes input vector to card/device memory (can be HBM or DDR)
2. Previously written data is read to FPGA
3. Data is processed in FPGA
4. Processed data is written back to card/device memory
5. Host gets result from HBM
![HBM Double Buffer](https://hackmd.io/_uploads/Syph9Kk32.png)
Our design supports the feature of writing the new vector and getting the result in parallel.
Additionally in the current version of the driver, the input byte vector must be divided into 16 segments, which we will call banks. The partitioning into banks is done inconsistently, and based on how further calculations will be done. At this stage of implementation, Blaze is responsible for all required conversions, so additional application integration or data manipulation is not required by the end user.
![Data Organization in HBM](https://hackmd.io/_uploads/HJj0HFJ33.png)
A detailed description of partitioning can be found in Section 2.4.1 Data Organization in our [White paper](https://github.com/ingonyama-zk/papers/blob/main/ntt_201_book.pdf).
## Using Blaze
:fire: A full description of the tests, which include the binary loading process and calculations will be available in the latest release. In addition, the binary file for NTT will be located there as well.
### Adding Blaze to an existing Rust project
First and foremost, let's connect [blaze](https://github.com/ingonyama-zk/blaze) to your project. To do this, run cargo command:
```shell!
cargo add ingo-blaze --git "https://github.com/ingonyama-zk/blaze.git"
```
After this you can see blaze in your dependencies:
```toml!
[dependencies]
ingo-blaze = { git = "https://github.com/ingonyama-zk/blaze.git"}
```
### Create connection to FPGA using DriverClient
The blaze architecture is designed so that we can load different drivers on the same FPGA. For this purpose we separate connections to the hardware itself and communication with it (directly to the module API)
To create a connection, it is necessary to specify the slot and type of card with which we will work. So far we support only the Xilinx C1100/U250 installed locally, but in the future we will add support for other cards as well and AWS F1 Instances.
```rust!
use ingo_blaze::driver_client::*;
let dclient = DriverClient::new("0", DriverConfig::driver_client_cfg(CardType::U250));
```
### Load program for NTT on FPGA
After opening the connection, let's load our [driver](https://github.com/ingonyama-zk/blaze/releases/tag/v0.4) (a program that describes how to perform specific calculations on the FPGA). To do this we need to specify the path to our file and load it into memory:
```rust!
let bin = ingo_blaze::utils::read_binary_file(&bin_fname)?;
```
Next we need to check if our FPGA is ready to load the driver and then directly load it on the FPGA:
```rust!
dclient.setup_before_load_binary()?;
dclient.load_binary(&bin)?;
```
An important note is that we can replace the loaded FPGA binary/image at run-time. That means you can reuse one conection for different version of one driver or for another drivers (MSM for example). Keep in mind - only single driver can be loaded at a time.
### Create the client for NTT module
After we succesfully conected to our FPGA and set up driver, we need to use this connection somewhere. As we mentioed before, each module must implement an trait [`DriverPrimitive`](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/driver_client/dclient.rs#L28) based on the needs of a particular computation. So let's further discuss what's hidden under each traid function for NTT.
The first step is always the creation of the client module itself. To do this, we need to specify its type and pass an already open connection:
```rust!
use ingo_blaze::ingo_ntt::*;
let driver = NTTClient::new(NTT::Ntt, dclient);
```
There is only one type for ntt for now: `NTT::Ntt`, but we can extend this module in the future.
If we look inside [`NTTClient`](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_api.rs#L12) , like other modules it is described by the following structures:
```rust!
pub struct NTTClient {
ntt_cfg: NTTConfig,
pub driver_client: DriverClient,
}
```
where `driver_client` includes general addreses for FPGA, and [`NTTConfig`](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_data.rs#L34) which is reprecented addreses memory space specific for NTT:
```rust!
pub(super) const NOF_BANKS: usize = 16;
pub(super) struct NTTAddrs {
pub hbm_ss_baseaddr: u64,
pub hbm_addrs: [u64; NOF_BANKS],
}
pub(super) struct NTTConfig {
pub ntt_addrs: NTTAddrs,
}
```
### Initilize the FPGA
For the NTT module, the initialisation currently allows us to configure the execution both in whole NTT computation mode only, as well as partial execution that we use to debug NTT.
```rust!
driver.initialize(NttInit{})?;
```
However, only full calculations are available to users. You can have a look inside the NTT [initialize method](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_api.rs#L37).
### Reading/Writing to the FPGA
NTT like other modules implements functions to [write](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_api.rs#L72) and [read](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_api.rs#L110) data from the FPGA.
```rust!
// Writing to the FPGA
driver.set_data(NTTInput {
buf_host,
data: in_vec,
})?;
```
```rust!
// Waiting and reading result from the FPGA
driver.wait_result()?;
let res = driver.result(Some(buf_host))?.unwrap();
```
Let's dive a bit into what happens to our original byte vector after we pass it to write.
The `NTTClient` after receiving inputs strarts `preprocess` computation. [In this function](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_data.rs#L80) the initial vector is distributed to the 16 banks in a particular order.
Next, each bank is written to the corresponding memory address.
```rust!
fn set_data(&self, input: NTTInput) -> Result<()> {
let data_banks = NTTBanks::preprocess(input.data);
data_banks
.banks
.into_iter()
.enumerate()
.try_for_each(|(i, data_in)| {
let offset = self.ntt_cfg.ntt_bank_start_addr(i, input.buf_host);
self.driver_client.dma_write(
self.driver_client.cfg.dma_baseaddr,
offset,
data_in.as_slice(),
)
})
}
```
You can see that the memory address depends on which memory buffer the host (`buf_host`) is working with:
```rust!
pub(super) fn ntt_bank_start_addr(&self, bank_num: usize, buf_num: usize) -> u64 {
self.hbm_bank_start_addr(bank_num) + (Self::NTT_BUFFER_SIZE * buf_num) as u64
}
```
In terms of the result, the FPGA does not actually receive a whole vector, but 16 banks that need to be processed:
```rust!
fn result(&self, buf_num: Option<usize>) -> Result<Option<Vec<u8>>> {
let mut res_banks: NTTBanks = Default::default();
for i in 0..NOF_BANKS {
let offset = self.ntt_cfg.ntt_bank_start_addr(i, buf_num.unwrap());
res_banks.banks[i] = vec![0; NTTConfig::NTT_BUFFER_SIZE];
self.driver_client.dma_read(
self.driver_client.cfg.dma_baseaddr,
offset,
&mut res_banks.banks[i],
)?;
}
let res = res_banks.postprocess();
Ok(Some(res))
}
```
So just as with writing we now need to calculate the address again depending on the function. And then transfer our banks to `postprocess`. You can see how the function is organised [here](https://github.com/ingonyama-zk/blaze/blob/98226ea2a07c7da8f8037fc7641d45117df6b94b/src/ingo_ntt/ntt_data.rs#L113).
### Run computation
While our read and write data functions depend on the host buffer, the start of the computation process is tied directly to the FPGA. So by swapping the `buf_host` and `buf_kernel` values we choose which section to start the calculation on.
The starting itself looks like this:
```rust!
driver.start_process(Some(buf_kernel))?;
```
## Conclusion
We are excited to see what the community builds with Blaze! And we welcome your contributions to the project on Github.
## Follow Ingonyama
Twitter: https://twitter.com/Ingo_zk
YouTube: https://www.youtube.com/@ingo_zk
LinkedIn: https://www.linkedin.com/company/ingonyama
Join us: https://www.ingonyama.com/careers