Diving Deeper into Blaze: NTT module

In our previous blog post we described Blaze - a Rust library for ZK acceleration on Xilinx FPGAs.
Since the release of Blaze, we have been actively working on its architecture and applying an API of our NTT primitives implementation. Today we are ready to introduce a new module for working with NTT.

What is the NTT module in nutshell?

Blaze architecture makes it easy to add new modules. In our introductory Blaze blog post we described the Poseidon hash function, and here we will describe the NTT module.

NTT, or Number Theoretic Transform, is the term used to describe a Discrete Fourier Transform (DFT) over finite fields. Our module provides an API to the calculation of NTT of size 2^27. To use it, the input byte vector of elements must be specified. Each element in the input vector must be represented in little-endian. The result will be a similar byte vector in which each element is represented as little-endian bytes.

How is NTT Structured from a Developer’s Point of View?

In this brief blogpost we will not dive in depth about how and why the calculations are built. You can read about this in a series of posts previously published:

More information on this subject will be released as part of our upcoming NTT Webinar.
The important thing for us is that on-device memory (in our case we’re working with HBM) is divided into two buffers:

In one of the buffers the host writes data - our input vector
In the other buffer a computation is taking place, and then they swap places.

The main advantage of this design (especially having two buffers) is that it supports NTT back 2 back computing.

So our calculation involves the following steps:

Host writes input vector to card/device memory (can be HBM or DDR)
Previously written data is read to FPGA
Data is processed in FPGA
Processed data is written back to card/device memory
Host gets result from HBM

HBM Double Buffer

Our design supports the feature of writing the new vector and getting the result in parallel.

Additionally in the current version of the driver, the input byte vector must be divided into 16 segments, which we will call banks. The partitioning into banks is done inconsistently, and based on how further calculations will be done. At this stage of implementation, Blaze is responsible for all required conversions, so additional application integration or data manipulation is not required by the end user.

Data Organization in HBM

A detailed description of partitioning can be found in Section 2.4.1 Data Organization in our White paper.

Using Blaze

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

A full description of the tests, which include the binary loading process and calculations will be available in the latest release. In addition, the binary file for NTT will be located there as well.

Adding Blaze to an existing Rust project

First and foremost, let's connect blaze to your project. To do this, run cargo command:

cargo add ingo-blaze --git "https://github.com/ingonyama-zk/blaze.git"

After this you can see blaze in your dependencies:

[dependencies]
ingo-blaze = { git = "https://github.com/ingonyama-zk/blaze.git"}

Create connection to FPGA using DriverClient

The blaze architecture is designed so that we can load different drivers on the same FPGA. For this purpose we separate connections to the hardware itself and communication with it (directly to the module API)

To create a connection, it is necessary to specify the slot and type of card with which we will work. So far we support only the Xilinx C1100/U250 installed locally, but in the future we will add support for other cards as well and AWS F1 Instances.

use ingo_blaze::driver_client::*;

let dclient = DriverClient::new("0", DriverConfig::driver_client_cfg(CardType::U250));

Load program for NTT on FPGA

After opening the connection, let's load our driver (a program that describes how to perform specific calculations on the FPGA). To do this we need to specify the path to our file and load it into memory:

let bin = ingo_blaze::utils::read_binary_file(&bin_fname)?;

Next we need to check if our FPGA is ready to load the driver and then directly load it on the FPGA:

dclient.setup_before_load_binary()?;
dclient.load_binary(&bin)?;

An important note is that we can replace the loaded FPGA binary/image at run-time. That means you can reuse one conection for different version of one driver or for another drivers (MSM for example). Keep in mind - only single driver can be loaded at a time.

Create the client for NTT module

After we succesfully conected to our FPGA and set up driver, we need to use this connection somewhere. As we mentioed before, each module must implement an trait DriverPrimitive based on the needs of a particular computation. So let's further discuss what's hidden under each traid function for NTT.

The first step is always the creation of the client module itself. To do this, we need to specify its type and pass an already open connection:

use ingo_blaze::ingo_ntt::*;
let driver = NTTClient::new(NTT::Ntt, dclient);

There is only one type for ntt for now: NTT::Ntt, but we can extend this module in the future.

If we look inside NTTClient , like other modules it is described by the following structures:

pub struct NTTClient {
    ntt_cfg: NTTConfig,
    pub driver_client: DriverClient,
}

where driver_client includes general addreses for FPGA, and NTTConfig which is reprecented addreses memory space specific for NTT:

pub(super) const NOF_BANKS: usize = 16;

pub(super) struct NTTAddrs {
    pub hbm_ss_baseaddr: u64,
    pub hbm_addrs: [u64; NOF_BANKS],
}

pub(super) struct NTTConfig {
    pub ntt_addrs: NTTAddrs,
}

Initilize the FPGA

For the NTT module, the initialisation currently allows us to configure the execution both in whole NTT computation mode only, as well as partial execution that we use to debug NTT.

driver.initialize(NttInit{})?;

However, only full calculations are available to users. You can have a look inside the NTT initialize method.

Reading/Writing to the FPGA

NTT like other modules implements functions to write and read data from the FPGA.

// Writing to the FPGA
driver.set_data(NTTInput {
    buf_host,
    data: in_vec,
})?;

// Waiting and reading result from the FPGA
driver.wait_result()?;
let res = driver.result(Some(buf_host))?.unwrap();

Let's dive a bit into what happens to our original byte vector after we pass it to write.

The NTTClient after receiving inputs strarts preprocess computation. In this function the initial vector is distributed to the 16 banks in a particular order.

Next, each bank is written to the corresponding memory address.

fn set_data(&self, input: NTTInput) -> Result<()> {
    let data_banks = NTTBanks::preprocess(input.data);

    data_banks
        .banks
        .into_iter()
        .enumerate()
        .try_for_each(|(i, data_in)| {
            let offset = self.ntt_cfg.ntt_bank_start_addr(i, input.buf_host);
            self.driver_client.dma_write(
                self.driver_client.cfg.dma_baseaddr,
                offset,
                data_in.as_slice(),
            )
        })
}

You can see that the memory address depends on which memory buffer the host (buf_host) is working with:

    pub(super) fn ntt_bank_start_addr(&self, bank_num: usize, buf_num: usize) -> u64 {
        self.hbm_bank_start_addr(bank_num) + (Self::NTT_BUFFER_SIZE * buf_num) as u64
    }

In terms of the result, the FPGA does not actually receive a whole vector, but 16 banks that need to be processed:

fn result(&self, buf_num: Option<usize>) -> Result<Option<Vec<u8>>> {
        let mut res_banks: NTTBanks = Default::default();
        for i in 0..NOF_BANKS {
            let offset = self.ntt_cfg.ntt_bank_start_addr(i, buf_num.unwrap());
            res_banks.banks[i] = vec![0; NTTConfig::NTT_BUFFER_SIZE];
            self.driver_client.dma_read(
                self.driver_client.cfg.dma_baseaddr,
                offset,
                &mut res_banks.banks[i],
            )?;
        }

        let res = res_banks.postprocess();
        Ok(Some(res))
    }

So just as with writing we now need to calculate the address again depending on the function. And then transfer our banks to postprocess. You can see how the function is organised here.

Run computation

While our read and write data functions depend on the host buffer, the start of the computation process is tied directly to the FPGA. So by swapping the buf_host and buf_kernel values we choose which section to start the calculation on.

The starting itself looks like this:

driver.start_process(Some(buf_kernel))?;

Conclusion

We are excited to see what the community builds with Blaze! And we welcome your contributions to the project on Github.

Follow Ingonyama

Twitter: https://twitter.com/Ingo_zk

YouTube: https://www.youtube.com/@ingo_zk

LinkedIn: https://www.linkedin.com/company/ingonyama

Join us: https://www.ingonyama.com/careers

Leon Hibnik

2023/07/30 11:28:14

Need to change terminology, as "Blaze is responsible for all required conversions so additional application integration or data manipulation is not required by the end user" (Edited)

2023/08/01 12:03:16

We will also upload a binary to blaze repo and have it under "release" with example code and test - aggregate two tests of load_binary and test ntt - this can be done after this blog (Edited)

2023/08/09 11:48:37

laze is responsible for all required conversio

What calculations and what SW? I assume you mean the blaze NTT driver, if so, the next sentence also states it so this one can be removed (Edited)

2023/07/30 11:24:48

Additionally in

Need to say that this design and separation should support a back2back NTT calculation (Edited)

2023/07/30 11:31:43

### Create connection to FPGA usin

This should come in the beginning/intro (Edited)

2023/07/30 11:33:07

### Create

This is current design, need to state that work in progress for support in a general size and that a unique size support can be added per request (Edited)

2023/07/30 11:34:21

### Create connection to FPGA using DriverClient The blaze architecture is designed so that we

This line can be deleted (Edited)

2023/08/09 11:33:15

More information on this subject will be released as part of our upcoming NTT Webinar. The important thing for us is

Replace with "stay tuned" (Edited)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.