changed 2 years ago
Published Linked with GitHub

Compiler Project(Draft)

Build an educational compiler from scratch.

Phase 1

TBD

  • Project name:
    • e.g. Orange

    Not even having a C

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    How about VitaminC? I come up this idea with the C.C. Lemon thing.
    Lai-YT
    VitaminC it is!
    Lee

  • GitHub Organization name:
    • Fruits

    Then, I suggest that we can use fruits as our organization name, and "Food for thought" as the description. 🤟🍊
    Lee

  • Source Language:
    • C (Simple)
  • Target Language:
    • RISC-V (Simple)
    • ARM (Accessible Hardware such as Raspberry Pi or use simulator)

    It would be much more enjoyable if we could run the compiled C executable on an x86-64 computer, wouldn't it? XD (looking forward on supporting it as one of our goals)
    Lai-YT
    Sure, we can add x86-64 as our second target. I chose RISC-V and ARM because they are RISC family instructions, which are easier to implement. Let's add x86-64 to phase 2.
    Lee

  • Compiler Language:
    • C (Simple, but we need to write our own data structures)
    • C++ (Standard Library support, align with LLVM)
    • Rust (Cargo package management and better document support, less familiar)
  • Do we write our own frontend? Or use lex, yacc instead?
    • write our own frontend

    I would like to handcraft the parser, but I currently have no idea how to handle the lexer.
    I will explore how other developers have dealt with the lexer and see if that gives me any ideas.
    Lai-YT
    Another option would be to use Lex and Yacc in Phase 1 to accelerate the development process. We could then prioritize optimization in Phase 2 or use a hand-written parser if optimization is not a major concern.
    But make sure we're careful about using them so that our compiler is well-modularized and each component can evolve with as little pain as possible.
    Lai-YT
    Sure, using Lex and Yacc can save us time after finished writing our compiler homework.
    Lee

Goals

  • Simple compiler without IR

    I think there is an advantage to using intermediate representation (IR) early in the compiler development process.
    By doing so, we can leverage the back-end of LLVM after implementing the front-end, rather than having to implement the entire compiler before being able to test it.
    However, I am unsure about the level of difficulty involved in converting the code to IR instead of directly converting it into RISC-V instructions.
    If it requires significant effort, the potential benefits may not justify the extra work involved.
    Lai-YT, revised by ChatGPT
    Nice one ChatGPT 😎

    Do we want to implement our own IR?
    In the long term, yes, I do want us to development our own IR.
    In the short term ,I think its more complicated since we don't have much experience in designing an IR. That's why I move implementing IR to phase 2 after we finished crafting a simple compile from frontend to backend. Also, I think after we implemented our own backend, we will have more thoughts on designing an IR.
    Yet, you did mention one point on having an existing backend for us to test. Maybe we can try to leverage a small compiler backend like QBE?

    Lee

    Designing a new IR seems quite unnecessary. It's hard to work with existing tools.
    While LLVM's IR can be quite complex, the QBE back-end that you suggested seems like a good starting point for beginners like us.

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Lai-YT
    Great! QBE's author even has a yacc example implementation, we can reference that example.
    Lee
    Cool!
    Lai-YT

  • CI + test

    Will we need a simulator to actually run the the compiled executable?
    Such as using Spike and QEMU.
    Lai-YT
    Yes! 🤟
    Lee

  • Documentation

References

  • awesome-compilers
  • chibicc
  • shecc: targeted at 32-bit Arm and RISC-V architecture
  • tinycc: ANSI C. Seems like it now supports RISC-V as also.
  • QBE: A small compiler backend written in C, supports multiple backends, such as amd64 (linux and osx), arm64, and riscv64.
  • How I wrote a self-hosting C compiler in 40 days:

    Summary:

    1. Start with small things and expand its features.
    2. Always remember to initialize struct or other objects to zero, or you will get garbage. (I made this mistake several times. 😂)
    3. Shocking how someone can write a C compiler in a month, but he said he had 15 years of experience in C, so I guess that's why. 😧
    4. Rei(Author): Although I'm thinking that 8cc is one of the best programs I have ever written, I'd choose a different design than that if I were to write it again. Particularly, I'd use yacc instead of writing a parser by hand and introduce an intermediate language early on.
      ☝️ Something we can think about, but we're newbies compared with his experience. 😂
      Lee

Phase 2

Goals

  • Intermediate Representation
  • Optimization
  • Target multiple backends, including x86-64
Select a repo