# Ramblings about public keys that were supposed to be twitter thread Hello, @gimre here. We should probably change the way we *display* public keys in all possible applications/tools/etc. But before I dive into this, let's refresh your memory about addresses. This part is Symbol oriented, but it's also true for NEM. ## Addresses (again) Addresses in Symbol consist of 3 parts: * 1-byte network prefix (0x68 byte in mainnet) * 20 bytes hash of a public key * 3 bytes checksum Trivia: checksum validates all 21 bytes prior to it, so because network byte differs, single public key will have different checksums in different networks, example: public key: `ADFAE6DA2A9776928DDD6F9EFE319C923C74DC878F9AE8A3A5D2E0B537A88FE6` | network | address bytes as hex | | --- | --- | | mainnet | **`68`** `D2C389D2B9FE827D60FCE68934A3B7089621634C` **`30E7A9`** | | testnet | **`98`** `D2C389D2B9FE827D60FCE68934A3B7089621634C` **`EA450A`** | We use base-32 to encode addresses, so the two above become strings of length 39 characters: * **`N`** `DJMHCOSXH7IE7LA7TTISNFDW4EJMILDJ` **`QYOPKI`** * **`T`** `DJMHCOSXH7IE7LA7TTISNFDW4EJMILDJ` **`TVEKCQ`** Another bit of trivia, we've deliberately decided to use base-32, reasons for that are pretty simple: * base-32 is clearly shorter than displaying as hex (39 chars vs 48 chars) * base-32 does not contain digits that can be mistaken with letters (0, 1, 8 - depending on font can be similar to O, I or l, B respectively) * base-32 can be selected with a double click. * (not really a reason, but because base-32 is a 5-bit encoding, it's trivial to calculate how many chars will be needed) Let's take a look at some alternatives. The two addresses above using base-58 encoding (used in bitcoin): * `AZFm5VUFibKz8mD7pvyUV92pzUzscNW1W` * `Ew4CHYiGBvsGH6rJy7r4m17kxig8cNEPb` As you can see, because base-58 is not fixed bit width, the two encodings looks completely different, this could be fixed, by moving network byte prior to checksum, in which case we'd get: * `LDR9XfL7pKg8BdUavb7tA9FNFnL` **`UBCKje`** * `LDR9XfL7pKg8BdUavb7tA9FNFnL` **`VRRzTb`** The _beef_ we had with base-58, is that: 1. produced output is case-sensitive 🙅‍♀️ 2. due to the way it works, it requires representing data to encode as a big number (and operations on that big number) - we didn't like that 🤷 3. this is minor issue, as we're talking only about *display* not internal information, but it's variable length (so resulting address can be anything from 24-33 characters) Finally, the other quite obvious alternative is base-64 encoding, the two addresses from earlier would be: * **`a`** `NLDidK5/oJ9YPzmiTSjtwiWIWNM` **`MOep`** * **`m`** `NLDidK5/oJ9YPzmiTSjtwiWIWNM` **`6kUK`** Issues: 1. the biggest "no-no" for base-64 is that in most OSes you can't easily select whole string, as it can contain `/` (as visible above) and `+` 2. obviously it's case-sensitive 🙅‍♀️ 3. worse, it contains similar looking characters (`oO0`, `iIl1`, `8B`) But it's not addresses I wanted to talk about, let's go back to public keys. ## Public keys In public-key cryptography key pair refers to pair of keys: private key + corresponding public key Right now in Symbol there's only one type of public keys: public keys that are part of ed25519 key pairs. We use so called compressed public key format, that has 32 bytes. We usually display it in hex like shown in previous section: * `ADFAE6DA2A9776928DDD6F9EFE319C923C74DC878F9AE8A3A5D2E0B537A88FE6` The problem with this is that **without a context**, this could be anything: * public key * transaction hash (or block hash, or any hash for that matter) * or in the worst case: private key 😬😱 Moreover, some public keys are tied with actuall accounts on chain and some are not. That's why I'm proposing to encode public keys in some way, I don't have ready solution, of what encoding it should be, there are few things worth considering though: 1. The encoding obviously does not need network byte - cause in case of public key, we usually don't care from which network it is (moreover, it's only about _display_) 2. The encoding should probably include information about _purpose_, so it should be visible right away if we're dealing with _regular_ public key or maybe _VRF_ linked key 3. The encoding should be extensible, meaning that if we'll have some **other** longer keys in future, they should be handled as well This is not a fleshed-out proposal, as I'd rather like people to discuss it, but parts 2 and 3. can be solved easily, by prefixing encoded public key with `<identifier>$`, where identifier could be number or character, i.e.: | identifier | meaning | | --- | --- | | 1 | generic purpose - normal account or remote account public key | | 2? r? | remote public key | | n | node public key | | T | voting public key | | v | vrf public key | So let's assume for a bit we'd also use base32 for encoding, here are few samples how it could look like: | type | public key | encoded public key (as displayed) | | --- | --- | --- | | account | `37A7EBEBABD0D47597C1E56160786686643FCB97731C67B543FF714C6CB77F46` | `1$G6T6X25L2DKHLF6B4VQWA6DGQZSD7S4XOMOGPNKD75YUY3FXP5DA` | | remote account | `FF8F64B3C0191C385070D410ADED3BFFDF3BEA7C6FB0B50489999168FABB871B` | `2$76HWJM6ADEODQUDQ2QIK33J377PTX2T4N6YLKBEJTGIWR6V3Q4NQ` | | node public key | `AAB034B654D50E53E2C6DAD4FE3E0088A59AA16F5EBB2835B282BE94DCDA8F96` | `n$VKYDJNSU2UHFHYWG3LKP4PQARCSZVILPL25SQNNSQK7JJXG2R6LA` | | voting public key | `057C39E1F970C40C53090503774FFD379EF4681B87C2CF4F914E56C2DABC4BC4` | `T$AV6DTYPZODCAYUYJAUBXOT75G6PPI2A3Q7BM6T4RJZLMFWV4JPCA` | | vrf public key | `6384EA239A3974B5D874B1A244D96391872BF79C11D8815F3CEA83588C722D52` | `v$MOCOUI42HF2LLWDUWGREJWLDSGDSX544CHMICXZ45KBVRDDSFVJA` | Note: The reason to introduce `$` between identifier and 'data', is this way one can still quickly select 'data' part and decode it _somewhere_. Usage of that `$` while handy in some cases, might be a bad idea, so it also could be changed to something that allows selection or removed completely. This is how it could look on explorer ![](https://hackmd.io/_uploads/Byz3IuqEh.png)