# μRust: A Simple Rust Programming Language **Compiler 2023 Programming Assignment I Lexical Definition Due Date: 2023/03/31** [Lab1 link](https://classroom.github.com/a/-zt8cyd_) Your assignment is to write a scanner for the **μRust** language with **lex**. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments. Your programming assignments are based around this division and later assignments will use the parts of the system you have built in the earlier assignments. That is, in the first assignment you will implement the scanner using lex, in the second assignment you will implement the syntactic definition in yacc. This definition is subject to modification as the semester progresses. You should take care in implementation that the codes you write are well-structured and able to be revised easily. ## μRust Language Features We highlight the features of μRust by comparing it with C language. It is very important to note that μRust is not Rust. - μRust is a static type and strong type language. - ```if``` and ```while``` does not enclosed by parentheses. - Simple example ```rust fn main() { println("Hello World!"); // equivalent to println!("Hello World!"); in rust } ``` ## Lexical Definitions Tokens are divided into two classes: - tokens that will be passed to the parser, and - tokens that will be discarded by the scanner (e.g., recognized but not passed to the parser). ### Tokens that will be passed to the parser The following tokens will be recognized by the scanner and will be eventually passed to the parser. #### Delimiters | Delimiters | Symbols | | -------- | -------- | | Parentheses | ```( ) [ ] { }``` | | Semicolon | ```;``` | | Comma | ```,``` | | Quotation | ```" "``` | | Newline | ```\n``` | #### Operators | Operators | Symbols | | -------- | -------- | | Arithmetic | ```+ - * /``` | | Relational | ```< > <= >= == !=``` | | Assignment | ```= += -= *= /= %=``` | | Logical | ``` && \|\| !``` | | Bitwise | ``` & \| ^ >> <<``` | #### Keywords Each of these keywords should be passed back to the parser as a token. The following keywords are reserved words of μRust: | Types | Keywords | | -------- | -------- | | Data type | ```i32 f32 bool char``` | | Conditional | ```if else for while loop``` | | Variable declaration | ```let``` | | Functional | ```fn return``` | #### Identifiers An identifier is a string of letters ( a ~ z , A ~ Z , _ ) and digits ( 0 ~ 9 ) and it begins with a letter or underscore. Identifiers are case-sensative; for example, ident , Ident , and IDENT are not the same identifier. Note that keywords are not identifiers. #### Integer Literals and Floating-Point Literals Integer literals: a sequence of one or more digits, such as 1, 23 , and 666 . Floating-point literals: numbers that contain floating decimal points, such as 0.2 and 3.141 . #### String Literals A string literal is a sequence of zero or more ASCII characters appearing between double-quote ( " ) delimiters. A double-quote appearing with a string must be written after a " , e.g., "abc" and "Hello world" . ### Tokens that will be discarded The following tokens will be recognized by the scanner, but should be discarded, rather than returning to the parser. #### Whitespace A sequence of blanks (spaces), tabs. #### Comments Comments can be added in several ways: - C style is texts surrounded by /* and */ delimiters, which may span more than one line; - C++ style comments are a text following a // delimiter running up to the end of the line. Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example, ```rust // this is a comment // line */ /* with /* delimiters */ before the end ``` and ```rust /* this is a comment // line with some /* and C delimiters */ ``` are both valid comments. #### Other characters The undefined characters or strings should be discarded by your scanner during parsing. ## What should Your Scanner Do? ### Assignment Requirements - We have prepared several μRust programs, which are used to test the functionalities of your scanner. - Each test program is 7pt. Note that the TA will prepare hidden test cases to verify that your scanner is not hardcoded to the attached inputs and outputs. For the hardcoded case, you will get 0pt. - We use local-judge ( ```pip3 install local-judge``` ) to judge your program. You can use the judge program to get the testing score by typing judge in your terminal. ![](https://i.imgur.com/gnJ50d7.png) - The output messages generated by your scanner must use the given names of token classes listed below: | Symbol | Token | | Symbol | Token | | Symbol | Token | | -------- | ---------------- | --- | -------------------- | ---------------- | --- | -------------- | ------------- | | ```+``` | ```ADD``` | - | ```&&``` | ```LAND``` | - | ```print``` | ```PRINT``` | | ```-``` | ```SUB``` | - | ```\|\|``` | ```LOR``` | - | ```println``` | ```PRINTLN``` | | ```*``` | ```MUL``` | - | ```!``` | ```NOT``` | - | ```if``` | ```IF``` | | ```/``` | ```QUO``` | - | ```(``` | ```LPAREN``` | - | ```else``` | ```ELSE``` | | ```%``` | ```REM``` | - | ```)``` | ```RPAREN``` | - | ```for``` | ```FOR``` | | ```>``` | ```GTR``` | - | ```[``` | ```LBRACK``` | - | ```i32``` | ```INT``` | | ```<``` | ```LSS``` | - | ```]``` | ```RBRACK``` | - | ```f32``` | ```FLOAT``` | | ```>=``` | ```GEQ``` | - | ```{``` | ```LBRACE``` | - | ```..``` | ```DOTDOT``` | | ```<=``` | ```LEQ``` | - | ```}``` | ```RBRACE``` | - | ```bool``` | ```BOOL``` | | ```==``` | ```EQL``` | - | ```;``` | ```SEMICOLON``` | - | ```true``` | ```TRUE``` | | ```!=``` | ```NEQ``` | - | ```,``` | ```COMMA``` | - | ```false``` | ```FALSE``` | | ```=``` | ```ASSIGN``` | - | ```"``` | ```QUOTA``` | - | ```let``` | ```LET``` | | ```+=``` | ```ADD_ASSIGN``` | - | ```\n``` | ```NEWLINE``` | - | ```mut``` | ```MUT``` | | ```-=``` | ```SUB_ASSIGN``` | - | ```:``` | ```COLON``` | - | ```fn``` | ```FUNC``` | | ```*=``` | ```MUL_ASSIGN``` | - | ```Int Number``` | ```INT_LIT``` | - | ```return``` | ```RETURN``` | | ```/=``` | ```QUO_ASSIGN``` | - | ```Float Number``` | ```FLOAT_LIT``` | - | ```break``` | ```BREAK```| |```%=```|```REM_ASSIGN```|-|```String Literal```|```STRING_LIT```|-|```as```|```AS```| ```&```|```BAND```|-|```Identifier```|```IDENT```|-|```in```|```IN```| ```\|```|```BOR```|-|```Comment```|```COMMENT / MUTI_LINE_COMMENT```|-|```while```|```WHILE```| ```~```|```BNOT```|-|```->``` |```ARROW``` |-|```loop```|```LOOP```| |```>>``` |```RSHIFT``` |-|```<<```|```LSHIFT```| ### Example of Your Scanner Output The example input code and the corresponding output that we expect your scanner to generate are as follows. - Input ```rust= fn main() { // Your first μrust program println("Hello World!"); /* Hello World */ /* */ } ``` - Ouput ```= fn FUNC main IDENT ( LPAREN ) RPAREN { LBRACE // Your first μrust program COMMENT NEWLINE println PRINTLN ( LPAREN " QUOTA Hello World! STRING_LIT " QUOTA ) RPAREN ; SEMICOLON NEWLINE /* Hello World */ MUTI_LINE_COMMENT /* */ MUTI_LINE_COMMENT NEWLINE } RBRACE Finish scanning, total line: 6 comment line: 4 ``` ### How to debug - Compile source code and feed the input to your program, then compare with the ground truth. ```bash $ make clean && make $ ./myscanner < input/a01_arithmetic.rs >| tmp.out $ diff -y tmp.out answer/a01_arithmetic.out ``` - Check the output file char-by-char (Space and Tab are different) ```bash $ od -c answer/a01_arithmetic.out ``` ## Environmental Setup - For Linux - Ubuntu 20.04 LTS - Install Dependencies - ```sudo apt install flex bison git python3 python3-pip``` Our grading system uses the Ubuntu environment. We will revise your uploaded code to adapt to our environment. In order to facilitate the automated code revision process, we need your help to arrange your code in the following format as specified in 5. Submission. ## Submission We use GitHub Classroom to collect assignments from students. For instructions on how to submit assignments, please refer to the [link](https://hackmd.io/oYNCJoSkSA23ss6sjtUkSA?view). Push your code to Github before the deadline. ## Reference - Flex: https://westes.github.io/flex/manual/ - Git document: https://git-scm.com/doc ## Further references about Rust (not μRust) - Rust token document: https://doc.rust-lang.org/reference/tokens.html - Rust playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021