IPFS Metadata Registry

This protocol is still a WIP, comments are very appreciated.

Problem

IPFS lacks the ability to search inside the huge amount of existing data, you must know exactly where a file is (by knowing the ContentID or CID) to get it back. This means there's no way to search inside the network using "words".

Since the decentralized web is getting more attention, we expect the next years more and more people will start hosting applications and protocols using IPFS. But like the pre-google internet there's no simple way to discover contents.

Goal

The protocol aims to create a collaborative, open-source way to track (and search) relevant data on IPFS. Let's say the main goal is to create the "Google of IPFS", of course taking only the good parts. Service will not track users and will remain privacy-oriented due to the simple fact that the entire database is shared between protocol's nodes.

Of course this protocol will not guarantee data retrievability or such, which can be achieved using other on-chain protocols like Retrieval Pinning.

Unlike Filecoin Network Indexer our goal is to search "all CIDs" containing a specific word. When we have found some interesting CID we can ask the Filecoin Network Indexer where actually retrieve the file.

Protocol overview

The protocol is made up of many different parties:

  • IPDB which is the database where the protocol will store informations, it's an on-chain database and will be integrated inside the smart contract. An extension of IPDB for full-text search will be developed to help the frontend easily search in the database. Database is a simple key/value store so we'll link original_cid to metadata_cid and only those two informations will go on-chain. Actual metadata will be retrieved directly from IPFS using metadata_cid.

  • A smart contract (which will be live in an EVM) that coordinates the work of the protocol's nodes and allows users to ask for active indexing (by paying a gas fee to the protocol).

  • Clients who will ask for active indexing of files by providing an onchain transaction and an index_fee. The amount needed to ask for an indexing is the result of multiplication between the size of the file (or files) and the index_price.

  • A network of nodes aka oracle, which is the core of the protocol, that will listen to the smart contract for active indexing. When some user asks for indexing the will retrieve the data and will download the metadata from the file (or folder). Then nodes will try to find a consensus on the metadata extracted (by creating a new json file). After an n of m nodes found and extracted same metadata an onchain transaction of aggregate signatures will be sent to the contract and the values will be stored. Threshold is a parameter inside the smart contract and if threshold is not met (for ex. because data is not retrievable) before the indexing_request_expiration the user will need to run a second transaction.

  • A command line interface to create replicas of the database, make local queries etc. It will exposes a set of APIs which will be used by the dApp or can be consumed with custom interfaces.

  • A dApp that will be the public frontpage of the protocol, specifically designed to work independently by anyone who wants to download entire database and make a search directly on its own computer. Each search will be done for free because informations are actually stored and replicated on your local machine. We expect of course to expose a public website.

Smart contract specifications

Parameters

  • indexing_request_expiration: Time that the oracle has to find a consensus about extracted metadata.
  • consensus_threshold: Amount of signature required to successfully index a file inside the protocol.
  • index_price: Amount in wei to store informations inside the protocol.
  • max_block_size: Amount of documents that can be stored in one single IPDB block, this parameter is used by oracle to decide after how much time store a new block on IPDB.

Methods

CreateIndexRequest{value: index_fee}(string _cid):

Method which will allow anyone create an index request. Value of the transaction is the result of the multiplication between file_size and index_price. Each request will get its own id and this id is emitted.

ConfirmIndex(uint256 _id_request, string _metadata_cid, bytes[] _signatures):

Method which will allow oracle store the metadata_cid by sending at least consensus_threshold signatures.

StoreBlock(string _database_cid, uint256 _block, bytes[] _signatures):

Method which will allow oracle store latest version of the database inside IPDB in a specific sector called block. This block can, eventually, be overwritten if there are errors when a new block is produced.

Events

IndexRequestCreated(account _sender, string _cid, uint256 _id_request):

Event emitted when an account creates a new index request.

NewBlockNeeded(uint256 _nextblock):

Event emitted when max_block_size is met and oracle needs to emit a new block on IPDB.

NewBlockStored(uint256 _block):

Event emitted when a new block is produced and stored on IPDB.

Admin functions

Admin functions can involve the fine tuning of the protocol (threshold, add or change oracle participant, change max block size) and will be handled not by the single owner of the contract but by collecting signatures.

Index flow

  1. An user emits sends a transaction by running CreateIndexRequest.
  2. An IndexRequestCreated event is emitted to network.
  3. Oracle will listen the event and will start search for the original file.
  4. If the original file is found it is scanned and metadata are extracted.
  5. After the oracle found a consensus will send a ConfirmIndex transaction.
  6. (optional) If the max_block_size is met the oracle will need to send a StoreBlock transaction before continue confirm indexes.

Complete database pinning

Database pinning is handled by oracle using internal node's space and optionally can be externally pinned by interested parties or public services like nft.storage, web3.storage or onchain.storage.

Database itself is divided in "chunks" or "blocks" which contains a batch of indexes. Just to iterate again, this block size is defined by protocol using parameter max_block_size.

Querying

Since the database and its blocks are public anyone interested can make a local copy.

Local copies will be done using another command line tool which will handle this task.

We expect to create also an user interface (dApp) to make simple queries in "Google-like" way.

Metadata standard

Of course a metadata standard (similar to the standard adopted for the web) is needed and will be proposed during the development of the protocol. Anyone will be also able to create a folder with a structure like:

folder_to_index (CID to be sent to contract)
 - metadata.json
 - FILE_1
 - FILE_2
 - FILE_3
 ..........
 - FILE_N

This kind of structure can help add some other extra informations that can be cached inside the metatada (for example a short description of the CID).

Comments and contributions

This is a WIP and any comment is appreciated, you can send me a line at sebastiano@yomi.digital or DM me on Twitter.