Set DID for datasets, [github ticket #533](https://github.com/dClimate/dclimate-monorepo/issues/533)

# Set DID for datasets, [github ticket #533](https://github.com/dClimate/dclimate-monorepo/issues/533)  Right now we can identify the datasets by id, which are mainly natural numbers, starting from 1 with the first dataset published and increasing the id one by one when new dataset is published.Instead of having this, we could add a decentralized identifier (DID) to identify the datasets ***DID’s brief explanation*** DID’s (Decentralized Identifiers) are identifiers that could be used globally from anywhere and anybody, meaning that anybody that uses that did refers to specific thing/object. In our case would be the datasets, so if we set did’s per dataset we would have a global identifier for our datasets. So I investigated about creating our DID to add it into our datasets, and I found that it was more difficult than it seems. I thought that DID ticket referred to add DID to datasets as mainly follow some rules/indications to create a DID, such as the format it has => schema:method:specific-id, for example: did:dclimate:dataset1. But I found that this has more logic behind it, as I sum up in the following lines: * So first of all, every DID resolves to a DID document which has the information related to the item identified by the DID (like urls resolves to specific webpage, DIDs resolve to specific DID document). Therefore, we would need to generate a DID document every time we create a new DID. (https://www.w3.org/TR/did-core/#a-simple-example) * The way this DID document and its corresponding DID url are created/resolved/updated/deactivated is by DID Methods (second part of DID string, between the colons). The method is the one which determines where this information is modified/created and how. For example, there are different approved methods by w3c (https://w3c.github.io/did-spec-registries/#did-methods), for example “3” which registry is ceramic network, “sol” which registry is Solana, “ens” for Ethereum… and much more. * Another important piece is the DID controller, who is the person, organization or autonomous software that has the capability - as defined to make changes to DID Document. The following picture shows the DID architecture (https://www.w3.org/TR/did-core/#architecture-overview): ![](https://i.imgur.com/RmK54UQ.png) Therefore, in order to create our DID, such as dClimate DID, it would take more time because we would need to determine important decisions and more time to create it. I mean, I think it is not something that we could create in a short period of time. **Summing up, ways to proceed in our case to apply DID's to datasets:** 1. Create our own DID: I think this option would take a lot of time for the benefit we would obtain from it. I think that the creation of the DID is not as simple as it seems because we would need to design/create the previous elements that takes part into the DID architecture. 2. Use one existing DID to refer to the datasets: I think this option would be good for us, as we could set DID's to identify the datasets, but without creating new DID from the beginning. Also, I think it would be useful if we deploy in different blockchains, like Ethereum and Poligon, and have same DID for same dataset. Would be great. There are some that could be found here: https://w3c.github.io/did-spec-registries/#did-methods  I took a look and play around with Ceramic, and that could be implemented for our case. We would need to create new DID document each time we create a dataset, which auto generates a DID pointing to this DID document referring to the dataset. By using ceramic, it would be also possible to create a DID Document with the information of the bookmarks for specific user account, where we could store the different bookmarked datasets per user and update it accordingly when user unbook one dataset. But I think this could be achieved by using any other DID, I need to investigate and compare the rest. 3. Leave it as we have it now, with id and not using DID’s, which works, but if we use any DID to manage the bookmarked datasets we could try to use same DIDs to identify the datasets   I would go for the 2 option, as there are a lot of DIDs and we could use these to determine the datasets.