IBM sorting a table

# Introduction This challenge was developed by [IBM Research](https://research.ibm.com). The objective of the challenge is to sort a multi-column table $T$ under CKKS. Sorting tables is a fundamental operation on databases and performing that efficiently under FHE will have a large impact on implementing database systems under FHE. ## Input data The input is a table, where the number of columns in the table is also a part of the input. See more below on the api you are requested to develop. The first column is special and is called `key`. It is an integer number in the range $[0, 255]$. The table should be sorted according to that key, see also the notes below about encoding the key. ## Table The table contains one column titled `key` and multiple data columns ($data_1, data_2, ..., data_n)$. The number of data columns, $n$, is a part of the input and the program get its when initializing the database. | *key* | $data_1$ | ... | $data_n$ | | --- | ------ | --- | ------ | | 23 | 55.23 | ... | 78.27 | | 88 | 34.12 |... | 56.74 | | 12 | 65.23 |... | 34.23 | See the notes below about the encoding of the key. ## Values The keys are integer numbers in $[0,255]$ and the data are real numbers in $[0,255]$. ## Key To let you focus on the implementation of the table and not on comparing values the key is given as 8 encrypted bits (and not a single encrypted integer number). So the table will look like this: | $key_1$ | ... | $key_8$ | $data_1$ | ... | $data_n$ | | ----- | --- | ----- | ------ | --- | ------ | | 0 | ... | 1 | 55.23 | ... | 78.27 | ### Comparing keys The goal of the challenge is the sorting algorithm and **not** to improve the comparison function. You are therefore requested to implement an 8-bit comparator. The details of the implementation of this comparator depend on the packing you use (see below on packing). ## Packing You may use any packing you like as long as your implementation does not leak information. Remember that the storage overhead your implementation adds is a one of the metrics you are graded on. You can choose the packing when initializing the database. For example, your code may choose one packing when the number of data columns is 8 and a different packing when the number of data columns is 16. To do that you are encouraged to use *packing-oblivious* programing (see [Tutorial on packing methods](https://research.ibm.com/haifa/dept/vst/tutorial_ccs2023.html)). ## Tasks To avoid gaming the challenge (for example, sorting in plaintext before encryption or after decryption) you are requested to provide the tools listed below. All tools receive the public key, $pk$. 1. ``init pk n`` - init a database with ``n`` data columns. 2. ``add pk k d1 d2 d3 ...`` - add a new record with key column equal to ``k`` and data columns given as ``d1, d2, ...``. You can assume $k\in[0,255]$ and $d_i\in[0,255]$ for all $i$. You can also assume the input is valid. 3. ``sort pk`` - sort the database according to the key column. 4. ``extract i`` - extract the $i$-th data column. The output of this tool is one or more ciphertexts that encode the values of the $data_i$ column. The $j$-th slot holding the value of the $j$-th record (either sorted or not, depending whether ``sort`` was called). If there are more records than the number of slots $N_s$, then extract the first $N_s$ records. This ciphertext will then be decrypted by us and be used to score your solution. # References ## Packing-oblivious coding You can use libraries that support packing-oblivious coding such as HElayers. You can find a tutorial here: [HElayers Tutorial](https://research.ibm.com/haifa/dept/vst/tutorial_ccs2023.html). Specifically, be mindful about [interleaved packing](https://colab.research.google.com/drive/1svQAt4KJkQAhlzbUo4fw94GJx8TcS6x5?usp=sharing) which you may find useful. ## Sorting networks A lot of research has been made about sorting networks. For example, [Batcher Sort](https://en.wikipedia.org/wiki/Batcher_odd%E2%80%93even_mergesort) ## Other useful Links * [FHERMA participation guide](https://fherma.io/how_it_works)—more about FHERMA challenges. * [OpenFHE](https://github.com/openfheorg/openfhe-development) repository, README, and installation guide. * [OpenFHE Python](https://github.com/openfheorg/openfhe-python) repository, README, and installation guide. * [Lattigo](https://github.com/tuneinsight/lattigo) * A vast collection of resources collected by [FHE.org](http://FHE.org) https://fhe.org/resources, including tutorials and walk-throughs, use-cases and demos. * [OpenFHE AAAI 2024 Tutorial](https://openfheorg.github.io/aaai-2024-lab-materials/)—Fully Homomorphic Encryption for Privacy-Preserving Machine Learning Using the OpenFHE Library. # Metrics The solution needs to be as efficient as possible. Efficiency here means several things: - **Time** - The solution should run as fast as possible. - **Storage** - The storage overhead should be as low as possible. To give a fair chance to as many packing schemes as possible, the storage efficiency will be tested with very large databases. - **Noise** - It is expected that the sorted table will include noise added by the FHE process. The goal is to keep the noise as small as possible. In addition the solution must also be: - **Flexible** - The same code must handle tables in various sizes. - **Private** - The solution must be private and not leak any data. Specifically, even a solution where it can be inferred that 2 records have the same key with a probability slightly higher than uniformly random is not acceptable. For example, using buckets when all the records fall in the same bucket will probably leak some info on the distribution of the key. ## Challenge Info 1. **Challenge type**: this challenge is a White Box challenge. Participants are required to submit the project with their source code. You can learn more about this type of challenges in our [Participation guide](https://fherma.io/how_it_works). 2. **Encryption scheme**: CKKS. 3. **Supported libraries**: [OpenFHE](https://github.com/openfheorg/openfhe-development), [Lattigo](https://github.com/tuneinsight/lattigo). 4. **Input Data:** a bash script calling the specified tools to init a database, add records into the database, sort it and extract a column. 5. **Output Data:** the outcome should be an encrypted vector with the values of one of the columns of the database. ## Parameters of the key 1. **Bootstrapping**: for bootstrapping support. 2. **Number of slots**: $2^{16}$. 3. **Multiplication depth**: 29. 4. **Fractional part precision (ScaleModSize)**: 51 bits. 5. **Integer part precision**: 9 bits. ## Timeline **June 03, 2023** - challenge starts. **August 03, 2024** - submission deadline. **August 18, 2024** - prize awarded. ## Parameters of the input // TODO: move here everything about the input, like packing, range etc ## Parameters of the output // TODO: move here everything about the output, like packing, range etc ## Test environment ### Hardware Submissions will be evaluated on a single core CPU. // TODO: RAM? ### Software The following libraries/packages will be used for generating test case data and for testing solutions: - **OpenFHE:** v1.1.4 - **OpenFHE-Python:** v0.8.6 - **Lattigo:** v5.0.2 // TODO: we don't need the rest actually, do we? - **HElayers:** v1.5.3.1 [Download](https://ibm.github.io/helayers/ 'Download HElayers') - **pyhelayers:** v1.5.3.1 [Download](https://ibm.github.io/helayers/ 'Download pyhelayers') ## Submission To address this challenge, participants can utilize one of the two libraries, OpenFHE or Lattigo. The executables in the project should be named `init`, `add`, `sort`, and `extract` . ### OpenFHE If the solution is developed using the OpenFHE library, we expect it to have a CMake project. The `CMakeLists.txt` file should be placed in the project's root directory. Please adhere to the following format when submitting your solution: 1. **File format:** - Your submission should be packed into a ZIP archive. 2. **Structure of the archive:** - Inside the ZIP archive, ensure there is a directory titled `app`. - Within the `app` directory, include your main `CMakeLists.txt` file and other necessary source files. ```mermaid graph TD; app_zip[app.zip] --> app_folder[app] app_folder --> CMakeLists[CMakeLists.txt] app_folder --> main.cpp[main.cpp] app_folder --> config.json[config.json] app_folder --> ...[...] ``` #### Config file // TODO: update this section according to final version of the challenge You can use a config file to set parameters for generating a context on the server for testing the solution. An example of such a config and detailed description of each parameter is given below. ``` { "indexes_for_rotation_key": [ 1 ], "mult_depth": 29, "ring_dimension": 131072, "scale_mod_size": 59, "first_mod_size": 60, "batch_size": 65536, "enable_bootstrapping": false, "levels_available_after_bootstrap": 10, "level_budget": [4,4] } ``` ##### Parameters * **indexes_for_rotation_key**: if an application requires the use of a rotation key, this option allows specifying indexes for the rotation key. If the rotation key is not used, it should be an empty array: `indexes_for_rotation_key=[]`. * **mult_depth**: the user can set the ring dimension. However, if a minimum ring dimension is set for the challenge, then the user can only increase this value; decreasing it is not possible. * **scale_mod_size**: this parameter is used to configure `ScaleModSize`, default value is `51`. * **first_mod_size**: this parameter allows setting up `FirstModSize`, default value is `60`. * **batch_size**: if the bootstrapping is not used, this parameter allows to set the batch size. Default value is `ring_dimension/2`. * **enable_bootstrapping**: if you need bootstrapping, set this option to `true`. * **levels_available_after_bootstrap**: this parameter allows setting up levels available after the bootstrapping if it's used. Note that the actual number of levels available after bootstrapping before next bootstrapping will be `levels_available_after_bootstrap - 1`, because an additional level is used for scaling the ciphertext before next bootstrapping (in 64-bit CKKS bootstrapping). * **level_budget**: the bootstrapping procedure needs to consume a few levels to run. This parameter is used to call `EvalBootstrapSetup`. Default value is [4,4]. #### Lattigo If the project is built using the Lattigo library, a Makefile is expected in the root directory of the project. Check out the project's template for Parity challenge [on GitHub](https://github.com/Fherma-challenges/parity/tree/main/lattigo/app). ## Command-line interface for application testing The application must support the Command Line Interface (CLI) specified below. ### OpenFHE - **--cc [path]:** the path to the crypto context file serialized in **BINARY** form. - **--key_public [path]:** the path to the public key file. - **--key_mult [path]:** the path to the evaluation (multiplication) key file. - **--db [path]:** the path to the database file. // TODO: no rotation key, right? The `init` tool should have: - **--columns [n]:** - the number of data columns in the database. The `add` tool should have: - **--key [k]:** the key of the record. - **--data [d1,d2,...]:** the data of the record given as comma separated floats. The `extract` tool should have: - **--column [n]**: the data column to extract. - **--out [path]**: the file to write the output to. ## Example Below is an example of an input script. Init a database with 3 data column: ``` ./init --cc [path] --key_public [path] --key_mult [path] --db my.db --columns 3 ``` Add 3 records: ``` ./add --cc [path] --key_public [path] --key_mult [path] --db my.db --key 78 --data 45.45,56.23,83.62 ./add --cc [path] --key_public [path] --key_mult [path] --db my.db --key 12 --data 12.23,23.34,34.45 ./add --cc [path] --key_public [path] --key_mult [path] --db my.db --key 68 --data 45.56,56.67,67.78 ``` After these commands (remember the key is encoded as a 8-bit integer) the table should look like: | $key_0$ | $key_1$ | $key_2$ | $key_3$ | $key_4$ | $key_5$ | $key_6$ | $key_7$ | $data_1$ | $data_2$ | $data_3$ | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ------ | ------ | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 45.45 | 56.23 | 83.62 | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 12.23 | 23.34 | 34.45 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 45.56 | 56.67 | 67.78 | ``` ./sort --cc [path] --key_public [path] --key_mult [path] --db my.db ``` After this the table should look like: | $key_0$ | $key_1$ | $key_2$ | $key_3$ | $key_4$ | $key_5$ | $key_6$ | $key_7$ | $data_1$ | $data_2$ | $data_3$ | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ------ | ------ | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 12.23 | 23.34 | 34.45 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 45.56 | 56.67 | 67.78 | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 45.45 | 56.23 | 83.62 | ``` ./extract --cc [path] --key_public [path] --key_mult [path] --db my.db --output out.ctxt --column 2 ``` After this, a ciphertext will be written in `out.ctxt` with this content: | 23.34 | 56.67 | 56.23 | | --- | -- | -- | ## Evaluation criteria Submissions will be evaluated based on these criteria: 1. **Privacy:** the solution must not leak any data. We are not asking for a rigor proof the solution is secure, but you should be able to argue whether your solution is secure. The score of your solution will be calculated as: $score = 10,000 \cdot s_{Acc} + s_{Tim} + 0.5 \cdot s_{Sto}$, where: 2. $s_{Acc}$ (**Correctness and accuracy**) is the $L_\infty$ norm distance between your output $A$ and the correct output $C$. That is, $a_{Acc} = \max_i |A[i] - C[i]|$ 3. $s_{Tim}$ (**Execution time**) is the running time (in seconds) of the ``sort`` utility. 4. $s_{Sto}$ (**Storage**) is the storage size (in MegaBytes) of ``my.db`` after inserting 262144 records with 16 columns each. HEre we will take the maximum of these 2 cases: 4.1. all keys have the same values 4.2. the keys are evenly distributed That is, the winner will be the fastest application whose score is minimal. // TODO: please note, **minimal score** here ^ ## Scoring & awards Solutions implemented with OpenFHE and Lattigo libraries will be scored separately. The winner in each group will be awarded **$2,500**. One participant can be the winner in two groups. Total challenge prize fund is **$5,000** ## Challenge committee * [Gurgen Arakelov](https://www.linkedin.com/in/gurgen-arakelov-943172b9/), Fair Math * [Jean-Philippe Bossuat](https://www.linkedin.com/in/jean-philippe-bossuat-9136024b/), Lattigo * [Nikita Kaskov](https://www.linkedin.com/in/nikita-kaskov-07029812a/), Fair Math * [Yuriy Polyakov](https://www.linkedin.com/in/yuriy-polyakov-796b84a/), Duality * [Hayim Shaul](https://www.linkedin.com/in/hayim-shaul-b2658/), IBM Research ## Help If you have any questions, you can: * Contact us by email: support@fherma.io * Join our [Discord](https://discord.gg/NfhXwyr9M5) server, and ask your questions in the [#fherma channel](https://discord.com/channels/1163764915803279360/1167875954392187030). * Open an issue in the [GitHub Repository](https://github.com/Fherma-challenges/parity). * Use [OpenFHE Discourse](https://openfhe.discourse.group) for OpenFHE related issues.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.