Silicon Salon IV Notes 2023-05-13

--- title: Silicon Salon IV Notes 2023-05-13 tags: Notes --- # Silicon Salon Notes 2023-05-03 #### 2023-05-03 9am PDT Slides (follow along): https://hackmd.io/@bc-silicon-salon/BkoVLgkNn These Notes: https://hackmd.io/fqvhrHMWTYee_X5o-_VUkQ ## Attendees NOTE: This document is public. Don't add your name here, or any private contact details, if you don't want them be listed as a participant in the final output from this Silicon Salon. [Chatham House rules apply](https://hackmd.io/@bc-silicon-salon/BkoVLgkNn#/9) for all quotes. * Christopher Allen (@ChristopherA) * Luke Leighton (Libre-SOC, RED Semiconductor Ltd) * * ## Notetaking ## Anti-exfil: Preventing key exfiltration through signature nonce data ChristopherA: Andrew Poelstra is Director of Research at Blockstream which is a team of cryptographers, bitcoin developers and network engineers with extensive research working on the bitcoin protocol. He has invented several bitcoin scalability and privacy technologies including taproot, musig, scriptless scripts, adaptor signatures, sidechains, and I had the honor of working with him for several years while working at Blockstream. Andrew: At Blockstream Research, we do cryptography focusing on signatures and variants of signatures. We also work on smart contract language design. Today I am going to talk about the signatures project that we have been working on for a few years called anti-exfiltration. ### EC signatures I thought it might be funny to open the first talk of the morning with a bunch of equations. This is the only equation I will show: s = kG + e * xG. This is the equation for a Schnorr signature. What I want you to take away from this is not what the exact formula is or what the plus or multiplication operations are, but rather you can think of an elliptic curve (EC) signature either ECDSA or Schnorr being a linear equation in two secret pieces of data. The idea here is that you have one piece of secret data is your actual secret key which corresponds to your public key. That's permanent. For your signature, you generate an ephemeral value that we call a nonce. You can think of it as an ephemeral key. The premise here is that every signature will represent an equation, and from grade 9 linear algebra you might remember that if you have a system of linear equations then if you have as many equations as you have unknowns then you can solve it, but if you have more equations than unknowns then you might not be able to solve it because it's unknown or inconsistent. If you have fewer equations than unknowns, then you also can't solve it because you have an infinity of possible solutions. As long as every singature equation we publish comes with another unknown- always one more unknown than equations- then nobody can extract your secret information because there's 2^256 possible solutions and nobody can distinguish your key from anyone else. Your signature should use your secret key. One immediate consequence of this is that if you don't generate a new nonce and you reuse a nonce from a previous signature you used, then someone can invert the matrix and solve this and do a direct computation to get your secret key. Most people are familira with that the idea that you should not reuse nonces when doing EC signatures. Also, you shouldn't use two nonces when the nonces are related by a known offset. Even if you don't reuse your nonces, then even slight deviations from randomness, then if you have enough signatures then you can use these lattice techniques dating back to the early 2000s where Breitner implemented this in 2019 and were able to extract a bunch of secret keys from signatures on the bitcoin blockchain exploiting nonces that were not uniformly random. If you use HMACs or other clever tricks, you can even bias the nonces in a way that only a specific attacker knowing the key to the HMAC can observe the bias and extract the keys while to everyone else it will look uniform. It is absolutely essential that nonces are generated at uniformly random. If a hardware wallet is generating the randomness, then the user doesn't really have any way to verify this. From the user's point of view, the hardware walle tis a black box that is producing signatures. If you think your hardware wallet is biased, you can try to break your signatures yourself, but short of tha tthere's not a lot of options. There's a few solutions to this issue. The standard solution to these nonce choice issues are things like deterministic nonces from RFC6979. You have a hash function, you have a secret key, and you generate a nonce that way. If you use a hash function like sha2 which seems empirically to have uniform output, then great, you will never reuse a nonce because you are always putting the message you're signing into your input so the message changing changes the nonce too. Also, you won't have any bias unless it's broken. So you can use deterministic nonces. As a user of a hardware wallet, the hardware wallet is a black box and maybe they're using RFC 6979 or maybe they're not and as a user there's no real way for you to tell. Maybe you can try to reproduce the signatures and see if it seems to happen to match the deterministic algorithm you expect, maybe. Or maybe they are feeding in ephemeral randomness, which is a goo didea, then you can't verify it either. In the crypto world, whenever you have a problem where nobody can verify something without getting access to secret data that you don't want them to have, then you could use a standard solution like a zero-knowledge proof. ZKPs are a cryptographic construction where you are able to prove some statement is true such as "this signature was generated using a specific deterministic nonce algorithm". You don't reveal your secret key, or your nonce. It is kept all private. Until 5 or 6 years ago, ... ZKPs are very expensive to produce and they take elaborate cryptography and the implementation is complex. Today, thanks to a lot of research from various people in the cryptocurrency world, it's now practical to produce zero-knowledge proofs on commodity compute hardware. Nobody expected this 10 years ago. If you're on a secure element or in a hardware wallet, though, it's not practical yet to generate these kinds of proofs. Another problem is that these proofs work in a way where you have to protect some permanent secret data and you use ephemeral random data to protect it. So the ZKPs themselves have nonces. If you don't have a ZKP that the ZKP was produced legitimately, then you have this infinite regress issue going on-- nonces all the way down. The ZKP doesn't go on the blockchain, so that's nice. But we would like to have a better, more efficient solution and doesn't itself have its own nonces. One solution that is a neat idea is to say that suppose for the rest of this talk that the host computer and hardware wallet are not simultaneously compromised nor colluding or biasing nonces no matter what you do. But what if the host has its own key, and the hardware wallet has its own key? They could produce a multisignature. In an EC multisig, the two participants combine their keys at key setup time, and then they combine their nonces at signing time. They both contribute some randomness. If one party tries to bias, the other party biases the bias and then nobody knows. It requires the host to have a key. A typical bitcoin wallet workflow requires the hardware wallet to have a seed and deterministically derive keys from the seed. The idea is that you could generate an infinite stream of keys that any other hardware wallet would be able to reproduce. You could also do this with a second hardware wallet. There are benefits to this, regardless of nonce bias- with two hardware wallets, if one of them is compromised, then even with that nonce bias then you might have two different wallets from two different vendors. But now you have twice the key management problem. You have more keys to back up. With two hardware wallets, you have to buy more hardware. Also the protocols aren't so mature. There's also some implementation complexity to this. If you only have one hardware wallet and you want to use a host as one side of this, then well host computers are generally not designed to securely store cryptographic material. A variant of this is that instead of using a strong EC key you could use a passphrase but this adds user complexity. There could be ransomware or you could lose your keys. What's my solution? It takes the multisig premise here which is that you have two participants and they mix their randomness in together and they avoid biases. The trick here is that for purposes of key ex-filtration through this nonce sidechannel you don't need to worry about key bias. You do at key setup time, you want to make sure your key is not compromised or coming from some debian RNG that only has 32 bits of entropy or something silly like that. Empirically it appears that you don't care about small amounts of bias in your key. It's only in the nonce where even a small compromise is the end of the world. So what we're going to do is do a weak form of multisignature where rather than combining the keys, we will just combine the nonces at signing time. The cool thing is that since the nonce is ephemeral, you can throw it away after use. So you can have a host computer that generates randomness, not even really good randomness, ... the host computer can generate this, pass it to the hardware wallet, the hardware wallet produces a signature and mixes in this randomness in some way that the host is able to verify and then the host throws away the randomness. We call our solution anti-exfil or anti-exfiltration. Cryptography is the field of cryptography related to trying to steal information through secret side channels. I encourage you to check the Wikipedia page, it's obscure 90s cypherpunk stuff. We changed the name though to exfil because it's a more widely known term. The premise is that the host will provide a random challenge to the hardware wallet and it will tweak the nonce it gets in a way that commits to the challenge and in such a way that the host can verify this tweak, and also this re-randomizes the nonce and eliminates bias. We will stick a hash function in there so even if the host randomness isn't that high quality we still do a complete re-randomization. As long as the attacker hasn't compromised the hardware wallet producing the original untweaked nonce, and the host, then we can't extract any information from here. We will jump to the very end now. We have a few bonus slids on the technical issues. https://blog.blockstream.com/anti-exfil-stopping-key-exfiltration/ https://github.com/opentimestamps/python-opentimestamps/pull/14 This is implemented on Blockstream Jade which is a hardware wallet that we manufacture. When I was researching for this talk, I found an old implementation from 2017 where I tried to insert this into opentimestamps which uses commitment schemes. Opentimestamps is for where you take some data you want to prove existed before a certain date, and you can commit to it into a giant merkle tree, and you put a commitment to the merkle root into the bitcoin blockchain. Since bitcoin has strong timestamps that are expensive to forge, you basically get a commitment to your data in the bitcoin blockchain proving that the data existed before that bitcoin block which gives a one-way proof of timestamping. In 2017 I was thinking of using this nonce tweaking trick not to tweak by random data but what about by some data that I want to timestamp? Then you can get this anti-exfil technique. If you trace through all the commitment structures here, you end up with a blockchain where -- the signature ... to the merkle tree.. so for free you get opentimestamps behavior and you can't distinguish an opentimestamps commitment from anything else and we can get blockchain space used down from 32 bytes down to 0 bytes. Just a neat piece of history on anti-exfil work. Jade does not have a secure element. The way Jade works- there are two modes. One is where you don't store key material and you type in your seed material every single time you turn on your device. Another mode is we have this other crypto technique where we store the key material where it is encrypted against the user's PIN where you need interaction between the user and our PIN server (and you can run your own PIN server if you want), and the PIN server is able to enforce a PIN trial denial of service limit and the PIN server doesn't know what the PIN is. We are able to outsource and basically treat some other server... it's another protocol, and it's open-source. Jade itself is written in C. For anti-exfil, the protocol is a bit of an ad hoc thing but you can read it in the source code. I think we have some design documentation. This is implemented in libsec256k1-zkp which is called from our GDK library which is used by Jade. We have a general purpose wallet library called GDK. It requires modifications to PSBT like we need for Musig and we might even be able to reuse some of the musig fields for PSBT. In order to do this protocol, basically rather than having a half-round of interaction where the host gives a transaction to sign and then receives back a signature then we add one more round of interaction to that so the host sends a challenge, gets a nonce back, and then the host sends the randomness and gets a signature back. So you would need one extra field. The premise here is that - PSBT might not be the right layer to implement this. Maybe it is. PSBT is generally used for multiple participants all producing a transaction together, and it's also used by hardware wallets as a way for the host to communicate with the hardware wallet. It's one participant but sort of split. The host and hardware wallet are the same person so they can afford to do some more ad-hoc stuff in passing things. We haven't proposed an extension to PSBT. Chosen nonce attack workbook: https://github.com/stepansnigirev/chosen_nonce_demo/blob/master/HD_key.ipynb ## Scalar and Vector Biginteger math in an ISA * https://ftp.libre-soc.org/siliconsalon2023.pdf * https://libre-soc.org/openpower/isa/svfixedarith/ * https://opentitan.org/book/hw/ip/otbn/doc/isa.html * https://libre-soc.org/openpower/sv/biginteger/analysis/ * https://libre-soc.org/openpower/sv/biginteger/mulmnu.c/ * https://libre-soc.org/openpower/sv/rfc/ls003/ Open silicon costs money to manufacture. The solution that Libre-SOC found was to create RED Semiconductor Ltd. We are a commercial organization that mirrors what has been an open-source project for the last 4 years. The simple reason for this is money. It takes about $10 million pounds to get a chip manufactured. We are developing through libre-soc a leading edge next-gen architecture microprocessor with vectorization and all developed under libre-soc. To get it into silicon that we can supply into a market or the open-source community then you need to take the poison pill of going commercial simply because of costs. People might want to ask me about that later. I'm the opening act. Luke will give the presentation on the technology. Who are we? Libre-SOC is researching and designing instructions which will be proposed to the OpenPOWER ISA working group for official inclusion in the Power ISA. There exists a sandbox area but we want this to be part of the mainstream Power ISA. What are the challenges faced by big integer math? By big integer math I mean elliptic curve, Diffie-Hellman, RSA, etc, the whole lot. When you have the whole post-quantum thing, it's hard to-- things get undermined and as Andrew mentioned there's also zero-knowledge proofs and people are constantly iterating and improving on those. If you put one of those into silicon, it will typically take 3 years. Actual FIPS certification also takes 5-7 years. At any point in that, if you get superceded or a fault is found, then that's 3 years worth of money down the drain. Things like AES have lasted the test of time and it's worthwhile to put those into circuits. The other issue is that if you have 32-bit or 64-bit then the performance sucks. The first temptation is to add SIMD instructions and the second temptation is to do custom instructions and now the problem is worse because you have hardly-used compiler toolchains that are specific to that system and they are not general purpose instructions. You then have a nightmare of a software toolchain for a small dedicated space. There's lots of software toolchain complexity for a small delicate task. How can we solve this? Go back to the algorithms real quick. We want everything to be general purpose. We want general purpose instructions that end up in mainstream compilers and the mainstream software toolchains and libraries. We looked at this and said let's go back to the algorithms... let's start with Knuth's algorithm D and algorithm M. Karatsuba, and so on. The first thing to note about SVP64 whic his the vector extension for PowerISA that we have been developing at libre-soc is that it has looping as a primary construct. This is similar to existing instructions. It says, I want you to loop on hte next instruction using the next instruction as a template. It's radically different from SIMD and Cray-style vectorizers. We are using scalar registers here. We're not doing dedicated vector registers. ......... Using this vector looping concept, you get for free an arbitrary length vector negate. The next bit is that carry in and carry-out for add is 1 bit,but how can we do 64-bit carry-in and carry-out to create a scalar vector operation? If you have a scalar by a vector then you can just do a loop around it and then you have Knuth algorithm D and algorithm M. If you look at other ISAs, the irony is that with the exception of Intel's DSLD instruction and maybe Intel mullex, they typically will drop half of the result on the floor. So the multiplier will drop the high half of the result or the low half of the result, and divide will drop either the modulo or the divide result and you need two instructions to get it. This is except for x86. The typical RISC ISA will need 2 instructions to get the thing. Turning add-with-carry into a vector-add operation PowerISA has an instruction called add-with-carry. If we chain them together, the carry-in in the first one becomes a standard carry-in for the LSB of your big-integer operation. The carry-out from your 64-bit add, on the next add you use it on the carry-in and you chain them all together. Part of the task I'm doing is doing a chain of logically obviously things but just doing them very very quickly. If we can express this in a way where you can specify how many of these operations you want to do in a chain, then you have one instruction that does vector-vector add. Real simple, right? Vector-scalar shift is more complex. One is you want to shift by greater by 64 bits, then you select the scalar register you want to shift from. Less than 64 bits? What we have is a normal shift will put 0s-- so if you are shifting left, you put 0s in the LSBs at the bottom. You throwaway the bits at the top. When you start doing big integer math, you actually need to fill in those bits at the bottom. This is the bit shift right. A naieve thought on this is that you would just do an operation which does un[i] >> s and that would seem to be ideal. It turns out though that you need a full vector-copy and you can't do the operation in place. If you are doing that, then you're in a register file where you are doing 4096-bit arithmetic and it's a lot of 4096-bit registers. What we did instead was the input side that gets set to 0 you source from a second input register. That's what gets shifted in and effectively it's a 64-bit carry in. The bit that is normally thrown away by a shift instruction you instead put it into a second output and consequently again you can-- that output becomes the chain going into the next instruction. Again, using the vector prefixing you repeat that, chain things together, and you have done in-place a vector by scalar-shift. Vector-scalar multiply, again, exactly the same thing with multiply. The Intel mulex instruction doesn't have the add. There's a paper about using a couple of new instructions like multex and addex where they interleve them. It's an interesting paper from 12 years ago. Instead, we can do it based on multiply and accumulate. Normal multiply-and-accumulate would treat the input register going into the LSB what you would do is slightly different... it effectively would, it's the same as the vector-vector add except that here what we're doing is bringing in just like in long-multiplication we bring in the extra 64 bits digit from the previous column and because you're producing a 128-bit result that high half goes into a carry-out register and when you do the chaining it becomes the carry-in of the next digit. Each time you call this instruction, it produces a new digit of your vector by scalar multiply. There's exactly the same story for vector-scalar divide. I'm delighted to be able to say that the implementation of these two instructions, the chained divide and chained multiply are actually inverses of each other including on the carry-in which is really interesting. You can have a modulo as input which gets included in the division. It's not a from scratch thing. We also had to special case overflow because if you look closely at Knuth's algorithm, it's not divide by zero but there's something similar it's special case where you can't store the result in 64-bit registers. Summary so far The initial concept was the usual 1-bit carry-in and carry-out and then extend it to 64-bits. The result is that you get, for free, a scalar vector arithmetic and you can do simple loops on top of that and then you have Knuth algorithms D and M. As a beautiful piece of irony, the original version had a carry SPR and it would have been a 32-bit carry at the time. They deprecated this, actually. That was rather annoying. Hence we are proposing these new instructions to make up for it. It turns out that the hardware is not made more complex by doing this because you are not doing any 64-bit hardware. You are not requiring 128-bit operations with 256-bit results. You're just outputting the second half of what is normally thrown away as the carry-out. But ISA has a bad rap for doing big integer because they don't have these instructions and you end up with complex carry workarounds. The complication in hardware which makes RISC proponents freak out is that you need 3-in and 2-out instructions. But with microcode you can do some chaining and get 3-in and 1-out, for example. ((Check the slides)) I looked at OpenTITAN. It has 256-bit wide data path with 32 256-bit wide registers. Zero-overhead loop control would have been much better. As much as I love the lowRISC team for the fact it is a community interest company, but I wince slightly when I find out what they have done. The initial code that they used is formally verified but having added 256-bit... .. unfortunately 256-bit arithmetic is unlikely to complete in a reasonable amount of time. If you can do everything in 64-bit or 32-bit, then there's a chance that you can complete formal verification proofs within a few weeks of CPU time. But 256-bit formal verification won't complete before the heat death of the universe. 256-bit is great for EC25519 but for RSA and others, you run into exactly the same problem as Scalar ISA, just worse. The OpenTITAN shift instruction is immediate-only and does not have shift-by-register amount. It does merge the 2 operands, so it has the core of the thing I showed you from an earlier slide. .... by comparison, the vector-chained 64-bit one, you can use a single scalar register as the 64-bit carry-in and 64-bit carry-out. Although it sounds like it's not possible to do in place, that you might end up overriding a thing, that vector looping can go in the reverse direction. We can take part of the carry-out going into one register which then you have the previous-- I've worked it out. Conclusion We went back to the Knuth D and M algorithms and examined what they were trying to achieve. You can do 64-bit carry-in and carry-out. Keeping to 64-bit or 32-bit PowerISA-- or 64-bit maximum hardware means that if you are doing a formal correctness proof then it will complete in a reasonable amount of time. It's reasonably straightforward to do these kind of ISA operations. It might freak out pure-RISC proponents (3-in 2-out) but look at the number of instructions, and algoirthm efficiency, and it speeds up general purpose code which means that it will end up in general use and you won't end up with more of a maintenance headache. Here's the pseudocode for these instructions. We have unit tests for all of them. You can see here that the product is RA * RB which is the normal way but we add in the carry-in on top of it as well. And then we return both the high-half and the low-half. Q: Where are you right now in terms of beginning to test this out or look at the actual performance improvements? Obviously 64-bit is useful but a lot of our crypto-- which crypto algorithms would it have the biggest impact on? A: All of them. Anything that requires arbitrary length big integer math. Although it's 64-bit registers, that's just the base unit. So you can set the vector length to 32. So you end up with a chain of quantity 32 each 64-bit registers, in one instruction, which is 4096. So our first CPU will have 128 64-bit registers. Again you have the standard load/store. If you really want to go to 31684-wide registers, then you go back to the standard Knuth algorithm. Take a look at mulmnu.c - this one has the Knuth algorithm. Our first target will be a 22nm node process. We will have the vector instruction set in it. Due to a quirk of how it works, what we're intending to do is although it will be single issue those single issues will be capable of issuing multiple scalar instructions into the backend pipeline. So it's a single issue of multiple vector scalar operations. Technically where we are- we have simulations. https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/decoder/isa;hb=HEAD We are moving to a clock cycle accurate simulation and we're also moving to a full FPGA implementation probably by August 2023. We are ready to lay out our first chip because we have solved sufficient of the key critical problems on the design of the chip. We're raising venture capital money. We need money and we're looking for volunteers and people to get involved in the project. We think we're close to VC funding. From the day we're VC funded, we will have a prototype chip through a shuttle run within 18 months, and a commercially available chip subject to fab availability within 24 months and this will probably be the first open-source microprocessor of this power and capability available in the market for people to use. ## Pitfalls and approaches to open-source security on semiconductors We will tlk about background on intellectual property rights, ASIC design flow, copyleft, practical considerations, license approaches, and Cramium's approach. Does open-source hardware really achieve the same purposes as open-source software? What about open-source licenses for hardware? Intellectual property You might suppose that this kind of thing is obvious and settled but there is in fact controversy in intellectual property. Copyright started with an act in 1790 and then technology made the the world more complex. There are guidelines around what can be copyrighted and what not. Something more utilitarian is often subject to patent not copyright; the expression of an idea is copyrightable but not the idea itself. If there's limited room for expression, then that effects the copyrightability. For those of you who are as old as I am, you might recall that there was a time when people questioned whether software could be copyrighted at all. From first principles, you could deduce that it's not a priori obvious that software could be copyrighted. A seminal case over this was in 1983. The district court ruled the opposite- that it is not copyrighted, and it was reversed by an appelate court. If you look at the executable form, does it meet these tests? The district court opined that it did not meet the copyrightability test. It's not obvious from first principles. We tend to forget that today because it was 40 years ago. I would just point you to some references that discussed this. A chip design involves diverse design artifacts which if you were to compare them on first principles then they compare in varying ways. There's much debate about which intellectual property rights even apply to hardware at all. There are some great analogies to software in the papers, but these are papers not case law. I will mention a few case law items I found interesting. One case holds that a schematic is copyrightable; but a schematic is only one of many types of design artifacts in ASIC development. Another thing that bears mentioning is that there was a specific act in 1984 called the SCPA. A lot of people misunderstood this; it is not a copyright or patent or trademark or trade secret. It's a sui generis right of "maskwork". Unlike a copyright, it must be registered. This is only one design artifact used in making a chip and it has its own one-off right of its own. Also in reference 24, it makes it clear that a lot of people don't regitser this maskwork right so it's not used all the time. What intellectual property rights apply to hardware? "Every single effort to tackle the problem of opppp"...... ASIC design flow Some parts of the chip start wit hwhat's looking like a programming language. You could say that VHDL or Verilog is somewhat like source code for software. Then there's the digital/logic part where you take that logical description and then you get into something that starts to have more relationship to the factory, like the RTL, or descriptions of gates and so on, like netlists. It might have drive strength, setup, and you map that into this and you create a sea of gates called a netlist. Some authors talk about an analogy like compiling software into a machine code instruction set. Library has a completely different meaning here than from software. The libraries do have proprietary information of the factory at which you fabricate the chip at. The next step has even more proprietary information- the netlist is a logical thing, but then you need to map it into a physical mapping of the gates in this process. All of this information like the sizes, the inductance, the resistance, the manufacturability is all proprietary information from the foundry and the foundry engaging a third-party vendor. You might not see it directly or have a contractual relationship but typically they engage third-parties to get their IP available for it. That's just the logic part. There's much more to chips. There's also memory, which isn't considered logic. They end up getting compiled into the chip by a special tool that makes them more efficient. It's often to bring in phase-lock loops, power regulators, a processor core, some non-volatile memory, a fuse for calibration, we talk about TRNG, and then vanilla things like USB is complex and you don't just connect a logic gate to the pin of the chip there needs to be a physical driver there and even other specialized logic functions. These can come in various ways and this effects what is 'complete source'. Some of these might come in as RTL, or they might be hard-layouts that get plugged in by the fabless company and sometimes the fabless company doesn't even have access to the final layout and it's plugged in by the foundry. There's timing checks; laborious scripts development for timing- it's hard to use the code if you don't have this. Test benches, that's another one. In software, you can incorporate software and compile and test it. In chips, you can't just do that. It's extremely expensive. If you are licensing a chip component from someone, you license it from them and then import test benches and if the vendor doesn't have those then their block isn't going to be that useful. There are tools that insert IP into your netlist or into your final design like membist- memorybist for self-test. Each unit has to be tested as you make it. These are all digital. You also have analog where you do schematic and you use items from the physical design kit to help you design and layout and verify that. There's a bunch of other quasi manual tweaks to the layout. Software building might take minutes. If you go back and change one line of your RTL, though, you take 6 weeks to get back to the same point. All these things modify your final artifact which is a GDSII form which is ultimately what creates the photographic plates that go to the factory. I don't expect you to absorb all these details. But there are many stages and they involve varying third-party IP. Those types of artifacts are fundamentally disimillar. In software, an object and binary aren't that different and there's linker-locators and resolving things. We think of software as hierarchical but it's all just code in the end. But that's not the case with chips. There's real, physical hierarchy of fundamentally dissimilar items. For each of those items, what's the correct licensing? What is the correct IP type or the right IP rights? What is "source" here? What is "complete" or "corresponding source" in this domain? It's much more complex in hardware than in software. Copyleft I think some of you know this but some don't- it's a term that was coined to describe licenses that promote and require proliferation instead of prohibiting proliferation like a copyright. So typically when you use inbound copyleft, you receive a license but subject to certain obligations to further publish and sublicense whatever you create. So what is the boundary of that which you are obligated to provide an outbound license to? Licenses are described as permissive, or weakly reciprocal, or if they cast a wider net then they are strongly recipricoal. For software licenses, the two points are what do you have available to work with? What are the rules? There's a small amount of license text, and then a huge amount of corpus of literature and de facto practice interpreting what copyleft means. There's also a small body of case law around this too. There's been entire careers just in the interpretation of copyleft. Incompatibility All of this whether the text in the license, or opinion around this-- but all of it has terms like statically linked, dynamically linked, system library, procedure call, etc. These are all software terms. In GPLv3, there are terms but they only have meaning in software. How would you interpret this for hardware? It's like taking a rule from baseball and asking how to apply it to baseball - it's basically nonsense. Why does this matter? What do people mean when there is a copyright conflict? If you make a product that combines something that has a copyleft license and a proprietary license, then in some cases the boundary of licensing might expand to include the proprietary code. So you might have an obligation to not disclose the proprietary work, but you might have an obligation that you must publish or sublicense. To whom you have that obligation, again, is a field of controversy but you have conflicting obligations. ASIC design is more complex and has more design artifact types and avoiding this conflict is more complex than in software. There are large companies that will not allow any GPLv3 in through the door simply because it's too complex for software licensing. If companies have that position for software, then you could question what should be your position in hardware be where it's even more complex to determine the ramifications? There are real risks and consequences here. Early in open-source software it was enforcement and guided by ideology for open-source. More recently there have been patent trolls and copyright trolls. The difference here is that, if it's copyright then that has statutory damages and you don't have to show actual damages. That's really what gives open source its teeth according to this reference. But a big difference between software and hardware is that a software provider might quickly modify their code. ASIC vendors can't do that; which might introduce more risk and leverage in this area. Practical considerations What is the purpose of open-source hardware? Why do we think it's desirable? You can make an analogy to software again. Does open-source hardware really provide these freedoms? Like those espoused by gnu.org? It's harder to say it does, because certain materials you can publish them but you can't really emulate precisely what will happen when you build silicon. Actually building the actual silicon is extremely expensive. It also involves a lot of third-party IP to get to a workable chip. I constantly hear about people who are immersed in software saying oh I read an article about this open design workflow. Does it really work? It has a place in the ecosystem. I wouldn't want to say it's not useful but I think people tend to underestimate the vast difference between a low-NRE desig nflow for prototyping and what you need to be in a semiconductor business. Chips have gotten exponentially better in cost, size, performance. But this has been accompanied by an exponential increase in design costs. These low NRE design flows are not the same thing at all. A 3d printer at the consumer level doesn't mean you can be in the lego brick business, just in terms of manufacturing tolerances, volume, and cost. Electrons are free, and atoms are expensive. Inspectability With code, you can end up with a file and get a hash of the file. But with ASIC there are so many diverse layers of hierarchy- how do you know what you get back matches what you think it is? Unit by unit- just because one is inspected doesn't mean that all the chips are the same. So there are things like signature, like test vectors, optical scanning, but each of those is limited and it's not the same as software comparison. Business considerations If I am licensing open-source software, then maybe the community can make some improvements and I could incorporate those. But in hardware, if someone makes these improvements then it's not necessarily easy for me to incorporate the improvements. Everyone in the ecosystem has these high costs for building chips. Licensing approaches If you go to opencores.org, you can see a bunch of licenses. The most common approach when people put RTL on a site like this is to use a software license. That's the most common approach. This addresses all the problems I've brought up by simply ignoring them. Some of the references point out the problems with that- so some say there should be hardware specific licenses, like TAPR, CERN, CERN-OHL-W, SolderPad, OSHWA, etc. These are all very different in their philosophy. When you read their rationales, they are all quite different. I think the CERN licenses are well thought out, like the permissive, reciprocal weakly, reciprocal strongly. All the software terminology is usually about libraries and system libraries and dynamic linking and instead CERN starts with hardware terminology like is a component available and then apply that to board-level hardware and chip design. I've never seen IPIL used. It's embedded in an academic paper somewhere. I'm unaware of any use of that license in industry yet. It's a nascent field. Instead of using software licensing, let's create new licenses that are specific to hardware. Cramium's approach Our approach, subject to change, is that we're building a chip and we're publishing an outbound CERN-OHL-W license. This diagram illustrates what can be done with it. The idea is that the core cryptography is implemented in these blocks but there's also a RISC-V core interfacing to some countermeasures that are physical things you can't emulate in a FPGA board anyway. So we meet the CERN-OHL-W definition anyway. All of this will be published under that license. This will help you instantiate it off-the-shelf-- buy a PC, buy a FPGA board, and instantiate it and run it. When the chip comes, then you can run those same things and verify that the chip responds in the same way. This is the goal. There are certain things that are exactly what's on the chip, some things are like what's on the chip but can't be the same because of third-party issues, and then some functions are generic and we can't emulate them precisely. On in-bound IP, there's no copyleft copyright licensed components that we're using. While not legally required, our goal is that the inbound and outbound would match the same, that seems like a good faith way of engaging with the community. On inspectability, take a look at Bunnie's blog post where he discusses inspectability. We are making this package, at his request, it's a wafer level PCB, and you put a re-distribution layer on top of the die and after singulation and backgrind you flip the whole thing over and mount it on a PCB. That means the backside is basically the raw wafer and you can put infrared laser through that and get some imaging. He goes into why that is useful in inspecting the die. It won't inspect down to the gate level because that's a micron wavelength. Open-source software itself is a developing field with open questions and controversies. Open hardware is comparably immature, and ASIC is even less mature htan that. The fundamentals are not clear, even the community goals. What are you trying to achieve? There's complexity in implementation especially in copyleft given that complexity and varied artifiacts in an ASIC desig nflow. Despite these problems, we want to carry the ball forward as much as we can at Cramium.