Instruction Set Architecture (ISA), Translator and Emulator

I made this note as a future reference for me (or any readers) over my [friend](https://www.facebook.com/LSlowmotion "my friend's social media")'s question of "Is ISA in the end is useless?" in response to the booming of **Winlator**. The references I use will be based mostly on the books written by David. A Patterson and John. L Hennessy called *Computer Organization and Design (RISC-V ed.)* and Computer Architecture. Those are great books that covers the basics of Computer Architecture (by no means this is a promotion whatsoever). Reading *Computer Organization* book first then continue with *Computer Architecture* is recommended as the former talks more about basic concepts of modern computer design and the latter explains more of the techniques hardware designers can use to improve performance. If the main topic is going to be translators and emulators, why talk about ISA? Well, I need a place to put my notes about ISA and before talking about translators and emulators, I think it is a good place to start. Since I'm going to talk about running programs cross-hardware, it is quite important to understand ISA as to understand how hard it is to run programs cross-hardware. ## 1. Instruction Set Architecture (ISA) ### Definition I will not just put a definition that can be found just by searching it on the internet or in a book then just be done with it. Instead, I will give two definitions, one is a book definition and one is my own. #### Textbook Definition In *Computer Organization and Design* p.22 ISA is defined as "Instruction set architecture also called architecture. **An abstract interface between the hardware and the lowest-level software that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, access, I/O, and so on.**" #### My Personal Definition Here's mine, *"ISA is a set of instructions that can be used to instruct/tell hardware to do things"*, that's basically what an ISA is for me, a standard. (technical details in chapter 3) :::danger This is my description of what ISA is, PLEASE do not use this as an answer to any class assignment as it is not a definition widely accepted (well of course, this is my own personal definition lol :joy:) ::: What I meant by hardware refers to computer hardware. The clumps of electronic circuit that only understand ones and zeroes. In the **Computer Organization and Design p.14** book, instructions are defined as "...collections of bits that the computer understands and obeys". So basically ones and zeroes. -- And in the **Computer Architecture: A Quantitative Approach** p.14, the "things" that the computer hardware can do usually are called **operations**. The book classify/categorize operations as data transfer, arithmetic/logical, control, and floating point operations. In other words, there are ways that are widely accepted on how hardware do operations called ISA. Programmers and computer architects has something to agree at and use it to help in hardware and/or software design. ### RISC vs CISC I've came across the term **RISC** and **CISC** before. **Reduced Instruction Set Computer** (RISC) and **Complex Instruction Set Computer** (CISC) are basically are two main approaches of instruction set. The main difference is that RISC computer will do 1 thing in a single instruction while CISC computer usually do more than 1 thing in a single instruction. ### Instruction Sets There are many kinds of ISA, examples of the famous ones are : - x86 instruction set (CISC) - RISC-V (risk five, as the name implies, it is RISC) - ARM (RISC) - MIPS (RISC) Different ISA has different ways on **how** to tell the computer what operation to execute. For example, giving a RISC-V processor the instruction (the ones and zeroes) meant for x86 will confuse the processor, and vice-versa. Technical details is written in chapter 3. Imagine talking to someone in a language that they do not understand. It will be very confusing and most if not all of the information is lost. A translator is needed to translate what was said into another language. (chapter 4 will talk about translator and emulator in a bit more detail) While there are debates on which one is more superior, it is the Computer Architech's job to determine which one they gonna pick. :::info Now I know what it means when someone says "an x86 processor"/"an ARM processor" instead of just nodding without having the slightest thing of what it means. ::: Fun fact, ISA is not always about CPUs, hardware like GPUs also have them, example [radeon](https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna4-instruction-set-architecture.pdf "RDNA4 ISA") and [nvidia](https://docs.nvidia.com/cuda/parallel-thread-execution/ "Nvidia's ISA"). But for simplicity I will focus on CPU ISA in this article. ## 2. ISA's Purposes ### What for? Let's look at what the book says about ISA >Both hardware and software consist of hierarchical layers using abstraction, with each lower layer hiding details from the level above. One key interface between the levels of abstraction is the instruction set architecture—the interface between the hardware and low-level software. This abstract interface enables many implementations of varying cost and performance to run identical software. -- **Computer Organization and Design** p.22 In short, ISA adds a layer of abstraction. This works because without knowing the detail of 1 and 0 used by computer, humans can still tell the computer things (details in chapter 3). ### By Design #### Abstraction The keyword for this is "abstraction". Abstraction is used to increase productivity in both hardware and software design. >... lower-level details are hidden to offer a simpler model at higher levels. >-- **Computer Organization and Design** p.11 Computer Architects (yes it is a real job) and programmers needed to be productive especially lately where computer hardwares have way more capabilities than ever before. So it is kind of by design that by ISA enables programmers and coders to write a code and run it on any computer hardware without even knowing the details of how the processor itself works. And that is amazing. #### Quick example to skip chapter 3 : A programmer code in C ```A + B = C;``` A translator (more specifically a compiler turns it into assembly language statement) : ```add C, A, B``` An Assembler turns it into machine language so that the computer understands for example assuming that C, A and B represented by *register x9, x20, x21* it will become : ```00000001010110100000010010110011``` #### How it impacts Productivity That's how abstraction works in nutshell. ```00000001010110100000010010110011``` can be communicated by only speaking ```A + B = C;```. That's how productivity in increased, programmers spending time designing a program, spending little to no time with 1s and 0s. While computer architects spending time designing a powerful hardware, spending little to no time with the algorithm and code that is going to be executed by the hardware. It is amazing and it is by design! ### But... Although saying ```00000001010110100000010010110011``` will be understood by a RISC-V processor, an ARM processor won't understand. The way ARM processor speaks it should've been ```10001011000101010000001010001001```. Is there a way to make other hardware understand the same information? Yes, and plenty at that. How? Go to chapter 4. ## 3. ISA, How to Communicate With Machine (Optional) :::info This part is a technical part showing a little bit of how ISA can be learnt/understood. How it gives a layer of abstraction. For those who are not interested can skip this part and go directly to chapter 4. ::: (I just put it here as a reference) Below is a list of instructions and how the book categorizes them. ![image](https://hackmd.io/_uploads/HJ8NOkC1xg.png) -- **Computer Architecture: A Quantitative Approach** p.15 ### Instructions Since I understand better with an example, I put an example here. Below are some examples of the RISC-V ISA : |Opcode|Instruction Meaning| |-------|-----| |ld|Load double word(from integer register)| |sd|Store double word (to integer register)| |add|Addition| |sub|Subtraction| |mul|Multiplication| |div|Division| (details of how to use them on the next sub-chapter) See that each instruction tells the hardware (in this case it is CPU) to do a certain operation. Although computer hardware is a thing that only understands ones and zeroes, it can be read and understood by them. What sorcery is this? #### What Human Reads The instructions mentioned before are using what is called as **assembly language**, in this case assembly for RISC-V processor. Assembly is used as a representation of the ones and zeroes used to instruct computer hardware to do operations. In other words, assembly is the "version" that can be read by human. Then what does the computer read? Well, assembly can be converted into a language that can be read by computer called **machine language**. For example ```add x9, x20, x21``` is easier for me to read than ``` 00000001010110100000010010110011 ```. #### What Computer Reads Machine language is the numeric version of the assembly, the one that the computer reads. Assembly is the "readable" version so that humans does not get confused with just plain ones and zeroes. To convert assembly into machine language, engineers use a tool called **Assembler**. An example would be *8086 assembler*, which can be used to convert assembly into machine language that can be read, understood and executed by 8086 processor. Let's look at an example from Computer Organization and Design p. 81. An instruction to add the content of register *x20* and *x21* and save it in register *x9*. :::info Register is a part of the processor that can be **directly** accessed/used to store/load data. ::: In assembly it is written as : ``` add x9, x20, x21 ``` It actually means : ``` 00000001010110100000010010110011 ``` Broken down : |0000000|10101|10100|000|01001|0110011| |-------|-----|-----|---|-----|-------| |7 bits|5 bits|5bits|3bits|5 bits|7bits| |field 1|field 2|field 3|field 4|field 5|field 6| - Each segments is called a field - This layout is called the instruction format - The first, fourth, and sixth fields (light blue) determine what the computer will perform, in this case it is an addition - The second field (orange) gives the register (x21) that used as a source for the addition operation - The third field (magenta) gives the register (x20) that used as the other source for the addition operation - The fifth field (dark green) contains the register (x9) used to store the value of the addition operation Hence the instruction ```add x9, x20, x21``` is understood by the computer since it is actually ```0000000 10101 10100 000 01001 0110011``` Different ISA will have different ways of making it into machine language. So, two different ISA might have the similar or even the same assembly but different machine language. Or the same machine language can mean one thing for one CPU while for the other CPU it means nothing. For example, in RISC-V,```add x9, x20, x21``` is equal to ```0000000 10101 10100 000 01001 0110011``` While in ARMv8 instruction set, ```ADD X9,X20,X21``` is equal to ```10001011000 10101 000000 10100 01001``` (pay attention that it has 5 fields instead of 6 like RISC-V) Or even in MIPS ```add $t0,$s1,$s2``` is equal to ```000000 10001 10010 01000 00000 100000``` (pay attention that MIPS uses different naming format for the registers) ## 4. Emulator, Translator, and Compiler Now, about running same code on different machine. I will put this link here as a helpful reference : https://www.microcontrollertips.com/compilers-translators-interpreters-assemblers-faq/ ### Translator As the name suggest, a translator translates a code from one programming language into another programming language, for example from *Python* to *C*/*C++*, even from *C* to *Assembly*. It is quite a broad term, as long as it converts a code into another programming language, it can be called a translator. :::info Zluda is a kind of translator, since it translate CUDA so it can be run in another hardware ::: ### Compiler #### What does it do >Compilers perform another vital function: the translation of a program written in a high-level language, such as C, C++, Java, or Visual Basic into instructions that the hardware can execute. >-- **Computer Organization and Design** p.14 First time I heard the term *Compiler*, it was when I used to convert a code into an executable so my computer can run the code that I wrote. What the compiler does is translating the code that was written into a file that can be executed by the computer (hence the name executables) since the computer cannot directly run the code written for example in *C*. I suppose it can be said that **compiler is a translator**. #### Quick Example For example, a programmer is using a programming language of high level for example *C* programming language. The code is then can be translated into assembly of sometimes directly into machine language, ones and zeroes that computers understand. ![image](https://hackmd.io/_uploads/SkqMS2Dxgx.png) -- **Computer Organization and Design** p.15 With the creation of high-level programming languages and then compilers to translate it into instructions that can be understood by computers, programmers can write a program without having to speak the same language as the one that the computer speaks. It is also important that computers with various cost, performance and specifications can run the same code that the programmer writes. >As we will see, the hardware in a computer can only execute extremely simple low-level instructions. To go from a complex application to the primitive instructions involves several layers of software that interpret or translate high-level operations into simple computer instructions, an example of the great idea of abstraction. >-- **Computer Organization and Design** p.13 #### Running the same code on a machine with different ISA Short explanation is **yes** (kinda). Although this answer drops a lot of details. In general, a programmer can write a program in C and use GCC for example to compile it and create x86 executables, run it on x86 computer. Then use RISC-V C Compiler to compile the same code and create RISC-V executables, run it on RISC-V computer. But sometimes there are line(s) of code that are added to speedup the code in specific architecture/hardware. For that kind of code, it is impossible to run the code cross-architecture. The following code is a general implementation of matrix multiply in *C*, it should have little to no changes when compiling for both x86 and RISC-V : ![image](https://hackmd.io/_uploads/ByfBYdcSxg.png) -- **Computer Organization and Design** p.219 The following code is optimized for x86 : ![image](https://hackmd.io/_uploads/SywdqdqHxl.png) -- **Computer Organization and Design** p.220 Running it on x86 might be faster than the general/unoptimized code, but it won't be *compile-able* by RISC-V C compiler, let alone a RISC-V machine execute it. >"The efficiency of the compiler affects both the instruction count and average cycles per instruction, since the compiler determines the translation of the source language instructions into computer instructions. The compiler’s role can be very complex and affect the CPI in varied ways." >-- **Computer Organization and Design** p.39 Is there another way for a RISC-V computer to run the code? Yes, with emulator. For example with [Felix86](https://felix86.com/ "Felix86 website"). By imitating an x86 processor, *software-ly-ish*, a RISC-V processor can run x86's instruction although not efficient nor perfect. (note on efficiency in chapter 5) This means, the original source code (in this example is the C code) is not even required. An x86 executables compiled in/for x86 computer in theory can be run using the emulator. (a software runs in a RISC-V processor and its job is to emulate x86 processor, thus allowing the RISC-V processor execute x86 executables). This is an important idea behind using an emulator. ### Emulator #### Definition Emulator usually refers to either hardware or software that is usually used to emulate/imitate a program or a device. Emulator makes what is usually called *host computer* able to imitate another hardware or software. An example will be a video game emulator like RPCS3, PCSX2 or ePSXe. It makes it so that any x86 (or sometimes ARM) processor it runs on can imitate the video game's processor allowing playstation software (in this case games) to be played in a device that is not a playstation like a home computer or even smartphones. #### Why Emulator Either no source code, only executables available. Or sometimes it is just the easiest way to run other architecture's executables. #### Software and Hardware Emulator Some people split emulators into two kinds, **software emulator** and **hardware emulator**. Others refers emulators as simulating both the software and hardware, in this article I will talk about the two kinds. ##### Software Software emulator is more of imitating by using software. A software makes sure that the processor it runs on can understand the original instructions that has to be executed since the original instructions are for the emulated hardware (for example instructions for PS1 or PS2). Just like software rendering or encoding or decoding, a code (software) is written so that the original instructions can be executed by the host computer. It might not be as fast as a dedicated hardware but it gets the job done. Examples of software emulator are like RPCS3 and PCSX2 ##### Hardware While hardware emulator usually uses configurable hardware (like FPGA or programmable ASICs) to imitate for example a playstation's processor. The hardware is configured to have similar capabilities of a playstation's processor or any other hardware it imitates. Doing so means that the hardware can execute the original instructions without needing any help from software (in this case are playstation games). Hardware emulator sometimes cost more but it is usually fast. An example of a hardware emulator is SuperStation One by [Taki Udon](https://x.com/takiudon_ "Taki Udon's X account") #### Winlator ![image](https://hackmd.io/_uploads/rJ_aTYqHgx.png) [Winlator](https://github.com/brunodev85/winlator "Winlator Github") is an Android application that enables android smartphones to run Windows (x86_64) applications. So in a sense, it is a software emulator that allows ARM processor in android devices to emulate Windows x86 software. There are people with the idea of playing PC games with android phones and calling it a *gaming setup*. While technically correct, assuming that the android phone it runs at have the same raw performance of a desktop computer (which actually mostly have less than) the performance that will be delivered in general is not going to be the same as running it on the original hardware. For those who care about performance, it is a center of debate. For most people, as long as it is running, people don't care how or what it runs on. Just like coffee or tea, hardcore enthusiasts might know what a good cup of coffee is but most people just want a decent cup of coffee. And that's the thing about emulators, as long as it gives a decent performance, (for example in PS emulators, as long as it is playable) most people don't care too much on the details. At least, that's what I can say from my observation and personal experience. ### Efficiency, it is never 100% Efficiency is a problem. For software emulators, since the hardware it runs on is not the same as the original hardware, it is impossible (at least when I wrote this article) to have the same efficiency be it time-wise or energy-wise as the original hardware when executing the original instruction. It is just that processor now becomes so fast, than for example a Ryzen 5800X is now way faster than a PS1 or PS2 processor that running the original instructions (compared to running it on the original hardware) still gives a fast enough performance for the user/player. ## 5. Compatibility, Performance, and Efficiency :::danger A lot of subjective statements below, any additional sources is welcomed to be used to update the article ::: What are the motivation in running an inefficient code? There is no one exact answer, but usually it comes down to "I just want it to run". When I look at a repository or a tutorial to do something, I just want to copy-paste it into my computer and run it without any errors. I suppose a lot of people can agree on my methods. How do I know it is true for other people? Most A.I tutorials on the internet are using Nvidia GPUs and it is not a coincidence that Nvidia is also one of the most popular hardware used for A.I. Nvidia's hardware is not the cheapest and it definitely is not the most efficient in terms of architecture, but it is the easiest to implement A.I at. And that's just A.I, programs that are available are much more than that and I think it applies to most cases. ### Requirements and Expectations What matters is the expectation or sometimes called the requirement. Like jobs in real life, a code is run with certain targets/requirements that has to be met. This target is usually set by the user which is based on the expectations based on what the user's perspective and most of the time is subjective. As long as a program runs within expectations, most people that are not perfectionist won't complain much and accept it as is. Especially when hardware is now much faster, as long as the requirement is met, it is enough. And when the requirement is as simple as "run it fast enough" or "run it and please don't be slow", efficiency might not be regarded as important especially for consumer grade hardware. ### Compatibility vs Performance The program that the user want to run might not be designed with broad compatibility in mind or it might even be an exclusive software that is made for a specific hardware. (looking at you console exclusive games) Thus, if there are attempts to make it run on another hardware that is originally not supported especially without the source code, the most common tricks are usually with emulator. By emulating the original hardware, my computer can run the program that it originally cannot run. I'm not sure on the exact number, but I suppose everyone can agree that running a code using emulator will never be 100% efficient. There will be performance hit in emulation, especially using software emulator since the device's hardware usually is completely different from the original hardware that the software supposed to run at. But hardware becomes much faster now than ever, and especially for home computers running inefficient code will not produce any complain since the performance in most cases are deemed fast enough. (In general, hardware has become much faster and cheaper. I can't say the same about GPU which in the last 5 years does not seem to support the "Technology becomes cheaper overtime" claim) >"Our view is that the instruction set architecture is playing less of a role today than in 1990..." >-- **Computer Architecture: A Quantitative Approach** p.xix ## Summary ![image](https://hackmd.io/_uploads/B1p_SDiHlg.png) -- **Slide taken Morgan Kaufmann Computer Architecture Presentation** p.14 >"The ISA serves as the boundary between the software and hardware." >-- **Computer Architecture: A Quantitative Approach** p.12 In nutshell, it is not that people is deliberately ignorant about the existence of ISA, but it is by design that ISA is there so that hardware people can work without being aware of the details of the software people, and vice versa. In technical terms, it provides abstractions so that programmers does not need to understand the know-how of the hardware and for the computer architect can focus on designing a hardware. Sometimes, it is enough to just know and understand what the hardware is capable of. It is by design that it is possible to write and run a code or a software without even knowing the hardware. And it is also possible to run the same code in another hardware without too much changes. Most problems arise when an already compiled code wanted to be executed at a different hardware. Especially for codes without available source code, running on an incompatible hardware poses challenges thus the need to emulate a totally different hardware emerges. In certain cases, a translator is used instead, for example like Cython, Radeon HIP or Zluda. In most cases it reduces the need to emulate the hardware and in some cases even reducing performance lost due to inefficiency. Even though in some cases a lot of performance is lost as a result of inefficiency, it is sometimes better than not being able to run it. And when computer hardware is 10x faster and 5x cheaper than a hardware from 2-3 decades ago, it becomes more and more acceptable to run it without too much complains. ## Sources Hennessy, J. L., & Patterson, D. A. (2019). Computer architecture: A quantitative approach (6th ed.). Morgan Kaufmann. Hennessy, J. L., & Patterson, D. A. (2018). Computer Organization and Design (RISC-V ed.). Morgan Kaufmann. Laung-Terng Wang, Yao-Wen Chang,& Kwang-Ting (Tim) Cheng. (2009). Electronic Design Automation : Synthesis, Verification, and Test. Morgan Kaufmann. ## Q&A ### Qs 1. What is the general difference between API (e.g. DirectX, Vulkan) and ISA? 2. There are several softwares which make a software based on certain API to works on other API (e.g. DXVK, VKD3D). What would be its category (translator, emulator, compiler, or else)? 3. There is no mention of binary translation. Is binary translation basically the same as emulation? Follow up : Is there any fundamental difference between binary translation and emulation? 4. There is no mention of instruction set extensions emulation (e.g. [Intel's ISA emulator](https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html "Intel ISA emulator") ) or cross ISA extensions emulation (e.g. , [x86 AVX to Arm](https://www.phoronix.com/news/Box64-0.3-Release "x86 AVX to ARM") ). With more complex extensions within each ISA, would it be more difficult to emulate (more performance loss, etc)? And what about the possibility of locking certain market which have been dominated by certain ISA (e.g. Android with Arm, PC with x86, etc.) out of other ISA? 5. There is no mention of AoT vs JIT, can you explain about the AoT vs JIT paradigm in emulator? ### As 1. I don’t have any sources on that but for me, I regard APIs akin to libraries. Because they are full of functions that you call so that you don’t have to deal with the GPUs ISA or GPU drivers. In a way, APIs provide additional layer of abstraction. When APIs are adopted by many, it is usually regarded as a standard. 2. In my opinion it is just another layer of abstraction with the highest level being the software, below that is the API, then the drivers, then below that I suppose are full of executables (which is a binary) that is managed by the driver. ![image](https://hackmd.io/_uploads/SytaldiHgx.png) 3. If a tool used to translate from one assembly code into another assembly code is called a translator then a tool to translate from one binary language into another binary language I think should be called a translator. So, the tool for binary translation is technically a translator since it translates from one language to another. For the follow up question, binary translation translates a binary language of one machine into another, naive approach sometimes will do just fine. Emulation on the other hand focuses on emulating/imitating the original hardware thus the focus/approach will be a bit different with emulation either software or hardware emulation, you don't have to do binary translation, just make sure there is an imitated hardware (either real hardware or a simulated hardware) that can read the original binary language. 4. So with some ISA getting more and more extensions added, it definitely is getting more difficult in terms of complexity for other ISAs to emulate. Talking about performance loss specifically, it is arguable since other ISA can also add extensions to catch up in performance or reduce performance loss (example of this is RISC-V’s V and P extension for SIMD operation). A computer architect can also add a custom hardware and a custom instruction outside of the standard ISA to their hardware to help with performance loss. Which adds to the number of possible decisions taken. It is possible that a certain market is locked by certain ISA. And it already happened. Examples being the mobile market dominated by ARM. (remember intel Atom phones? I think it was ASUS Zenfone 2 or something, x86 android and such, it was a flop) or laptop market dominated by x86 (Rest in peace Snapdragon Laptops). If it happens, it happens. It doesn't have to be because of a good ISA. Sometimes it is just because people already adopted for so long that adopting a new standard is next to impossible, mostly due to compatibility issues. Waiting for people doing a workaround is not a guarantee. It all comes down to the incentives. If creating an alternative has rewards and incentives (which mostly don’t), for example reduces cost significantly then there is hope. Mostly don’t. 5. [under construction]