Compiler 2023 Programming Assignment I
Lexical Definition
Due Date: 2023/03/31
Lab1 link
Your assignment is to write a scanner for the μRust language with lex. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments.
Your programming assignments are based around this division and later assignments will use the parts of the system you have built in the earlier assignments. That is, in the first assignment you will implement the scanner using lex, in the second assignment you will implement the syntactic definition in yacc.
This definition is subject to modification as the semester progresses. You should take care in implementation that the codes you write are well-structured and able to be revised easily.
We highlight the features of μRust by comparing it with C language. It is very important to note that μRust is not Rust.
if
and while
does not enclosed by parentheses.fn main() {
println("Hello World!"); // equivalent to println!("Hello World!"); in rust
}
Tokens are divided into two classes:
The following tokens will be recognized by the scanner and will be eventually passed to the parser.
Delimiters | Symbols |
---|---|
Parentheses | ( ) [ ] { } |
Semicolon | ; |
Comma | , |
Quotation | " " |
Newline | \n |
Operators | Symbols |
---|---|
Arithmetic | + - * / |
Relational | < > <= >= == != |
Assignment | = += -= *= /= %= |
Logical | && || ! |
Bitwise | & | ^ >> << |
Each of these keywords should be passed back to the parser as a token.
The following keywords are reserved words of μRust:
Types | Keywords |
---|---|
Data type | i32 f32 bool char |
Conditional | if else for while loop |
Variable declaration | let |
Functional | fn return |
An identifier is a string of letters ( a ~ z , A ~ Z , _ ) and digits ( 0 ~ 9 ) and it begins with a letter or underscore. Identifiers are case-sensative; for example, ident , Ident , and IDENT are not the same identifier. Note that keywords are not identifiers.
Integer literals: a sequence of one or more digits, such as 1, 23 , and 666 . Floating-point literals: numbers that contain floating decimal points, such as 0.2 and 3.141 .
A string literal is a sequence of zero or more ASCII characters appearing between double-quote ( " ) delimiters. A double-quote appearing with a string must be written after a " , e.g., "abc" and "Hello world" .
The following tokens will be recognized by the scanner, but should be discarded, rather than returning to the parser.
A sequence of blanks (spaces), tabs.
Comments can be added in several ways:
Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example,
// this is a comment // line */ /* with /* delimiters */ before the end
and
/* this is a comment // line with some /* and C delimiters */
are both valid comments.
The undefined characters or strings should be discarded by your scanner during parsing.
pip3 install local-judge
) to judge your program. You can use the judge program to get the testing score by typing judge in your terminal.Symbol | Token | Symbol | Token | Symbol | Token | ||
---|---|---|---|---|---|---|---|
+ |
ADD |
- | && |
LAND |
- | print |
PRINT |
- |
SUB |
- | || |
LOR |
- | println |
PRINTLN |
* |
MUL |
- | ! |
NOT |
- | if |
IF |
/ |
QUO |
- | ( |
LPAREN |
- | else |
ELSE |
% |
REM |
- | ) |
RPAREN |
- | for |
FOR |
> |
GTR |
- | [ |
LBRACK |
- | i32 |
INT |
< |
LSS |
- | ] |
RBRACK |
- | f32 |
FLOAT |
>= |
GEQ |
- | { |
LBRACE |
- | .. |
DOTDOT |
<= |
LEQ |
- | } |
RBRACE |
- | bool |
BOOL |
== |
EQL |
- | ; |
SEMICOLON |
- | true |
TRUE |
!= |
NEQ |
- | , |
COMMA |
- | false |
FALSE |
= |
ASSIGN |
- | " |
QUOTA |
- | let |
LET |
+= |
ADD_ASSIGN |
- | \n |
NEWLINE |
- | mut |
MUT |
-= |
SUB_ASSIGN |
- | : |
COLON |
- | fn |
FUNC |
*= |
MUL_ASSIGN |
- | Int Number |
INT_LIT |
- | return |
RETURN |
/= |
QUO_ASSIGN |
- | Float Number |
FLOAT_LIT |
- | break |
BREAK |
%= |
REM_ASSIGN |
- | String Literal |
STRING_LIT |
- | as |
AS |
& |
BAND |
- | Identifier |
IDENT |
- | in |
IN |
| |
BOR |
- | Comment |
COMMENT / MUTI_LINE_COMMENT |
- | while |
WHILE |
~ |
BNOT |
- | -> |
ARROW |
- | loop |
LOOP |
>> |
RSHIFT |
- | << |
LSHIFT |
The example input code and the corresponding output that we expect your scanner to generate are as follows.
fn main() { // Your first μrust program
println("Hello World!");
/* Hello
World */ /*
*/
}
fn FUNC
main IDENT
( LPAREN
) RPAREN
{ LBRACE
// Your first μrust program COMMENT
NEWLINE
println PRINTLN
( LPAREN
" QUOTA
Hello World! STRING_LIT
" QUOTA
) RPAREN
; SEMICOLON
NEWLINE
/* Hello
World */ MUTI_LINE_COMMENT
/*
*/ MUTI_LINE_COMMENT
NEWLINE
} RBRACE
Finish scanning,
total line: 6
comment line: 4
$ make clean && make
$ ./myscanner < input/a01_arithmetic.rs >| tmp.out
$ diff -y tmp.out answer/a01_arithmetic.out
$ od -c answer/a01_arithmetic.out
sudo apt install flex bison git python3 python3-pip
Our grading system uses the Ubuntu environment. We will revise your uploaded code to adapt to our environment. In order to facilitate the automated code revision process, we need your help to arrange your code in the following format as specified in 5. Submission.
We use GitHub Classroom to collect assignments from students. For instructions on how to submit assignments, please refer to the link. Push your code to Github before the deadline.