Try   HackMD

μRust: A Simple Rust Programming Language

Compiler 2023 Programming Assignment I
Lexical Definition
Due Date: 2023/03/31

Lab1 link

Your assignment is to write a scanner for the μRust language with lex. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments.

Your programming assignments are based around this division and later assignments will use the parts of the system you have built in the earlier assignments. That is, in the first assignment you will implement the scanner using lex, in the second assignment you will implement the syntactic definition in yacc.

This definition is subject to modification as the semester progresses. You should take care in implementation that the codes you write are well-structured and able to be revised easily.

μRust Language Features

We highlight the features of μRust by comparing it with C language. It is very important to note that μRust is not Rust.

  • μRust is a static type and strong type language.
  • if and while does not enclosed by parentheses.
  • Simple example
fn main() {
    println("Hello World!"); // equivalent to println!("Hello World!"); in rust
}

Lexical Definitions

Tokens are divided into two classes:

  • tokens that will be passed to the parser, and
  • tokens that will be discarded by the scanner (e.g., recognized but not passed to the parser).

Tokens that will be passed to the parser

The following tokens will be recognized by the scanner and will be eventually passed to the parser.

Delimiters

Delimiters Symbols
Parentheses ( ) [ ] { }
Semicolon ;
Comma ,
Quotation " "
Newline \n

Operators

Operators Symbols
Arithmetic + - * /
Relational < > <= >= == !=
Assignment = += -= *= /= %=
Logical && || !
Bitwise & | ^ >> <<

Keywords

Each of these keywords should be passed back to the parser as a token.
The following keywords are reserved words of μRust:

Types Keywords
Data type i32 f32 bool char
Conditional if else for while loop
Variable declaration let
Functional fn return

Identifiers

An identifier is a string of letters ( a ~ z , A ~ Z , _ ) and digits ( 0 ~ 9 ) and it begins with a letter or underscore. Identifiers are case-sensative; for example, ident , Ident , and IDENT are not the same identifier. Note that keywords are not identifiers.

Integer Literals and Floating-Point Literals

Integer literals: a sequence of one or more digits, such as 1, 23 , and 666 . Floating-point literals: numbers that contain floating decimal points, such as 0.2 and 3.141 .

String Literals

A string literal is a sequence of zero or more ASCII characters appearing between double-quote ( " ) delimiters. A double-quote appearing with a string must be written after a " , e.g., "abc" and "Hello world" .

Tokens that will be discarded

The following tokens will be recognized by the scanner, but should be discarded, rather than returning to the parser.

Whitespace

A sequence of blanks (spaces), tabs.

Comments

Comments can be added in several ways:

  • C style is texts surrounded by /* and */ delimiters, which may span more than one line;
  • C++ style comments are a text following a // delimiter running up to the end of the line.

Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example,

// this is a comment // line */ /* with /* delimiters */ before the end

and

/* this is a comment // line with some /* and C delimiters */

are both valid comments.

Other characters

The undefined characters or strings should be discarded by your scanner during parsing.

What should Your Scanner Do?

Assignment Requirements

  • We have prepared several μRust programs, which are used to test the functionalities of your scanner.
  • Each test program is 7pt. Note that the TA will prepare hidden test cases to verify that your scanner is not hardcoded to the attached inputs and outputs. For the hardcoded case, you will get 0pt.
  • We use local-judge ( pip3 install local-judge ) to judge your program. You can use the judge program to get the testing score by typing judge in your terminal.

  • The output messages generated by your scanner must use the given names of token classes listed below:
Symbol Token Symbol Token Symbol Token
+ ADD - && LAND - print PRINT
- SUB - || LOR - println PRINTLN
* MUL - ! NOT - if IF
/ QUO - ( LPAREN - else ELSE
% REM - ) RPAREN - for FOR
> GTR - [ LBRACK - i32 INT
< LSS - ] RBRACK - f32 FLOAT
>= GEQ - { LBRACE - .. DOTDOT
<= LEQ - } RBRACE - bool BOOL
== EQL - ; SEMICOLON - true TRUE
!= NEQ - , COMMA - false FALSE
= ASSIGN - " QUOTA - let LET
+= ADD_ASSIGN - \n NEWLINE - mut MUT
-= SUB_ASSIGN - : COLON - fn FUNC
*= MUL_ASSIGN - Int Number INT_LIT - return RETURN
/= QUO_ASSIGN - Float Number FLOAT_LIT - break BREAK
%= REM_ASSIGN - String Literal STRING_LIT - as AS
& BAND - Identifier IDENT - in IN
| BOR - Comment COMMENT / MUTI_LINE_COMMENT - while WHILE
~ BNOT - -> ARROW - loop LOOP
>> RSHIFT - << LSHIFT

Example of Your Scanner Output

The example input code and the corresponding output that we expect your scanner to generate are as follows.

  • Input
fn main() { // Your first μrust program println("Hello World!"); /* Hello World */ /* */ }
  • Ouput
fn FUNC main IDENT ( LPAREN ) RPAREN { LBRACE // Your first μrust program COMMENT NEWLINE println PRINTLN ( LPAREN " QUOTA Hello World! STRING_LIT " QUOTA ) RPAREN ; SEMICOLON NEWLINE /* Hello World */ MUTI_LINE_COMMENT /* */ MUTI_LINE_COMMENT NEWLINE } RBRACE Finish scanning, total line: 6 comment line: 4

How to debug

  • Compile source code and feed the input to your program, then compare with the ground truth.
$ make clean && make
$ ./myscanner < input/a01_arithmetic.rs >| tmp.out
$ diff -y tmp.out answer/a01_arithmetic.out
  • Check the output file char-by-char (Space and Tab are different)
$ od -c answer/a01_arithmetic.out

Environmental Setup

  • For Linux
    • Ubuntu 20.04 LTS
  • Install Dependencies
    • sudo apt install flex bison git python3 python3-pip

Our grading system uses the Ubuntu environment. We will revise your uploaded code to adapt to our environment. In order to facilitate the automated code revision process, we need your help to arrange your code in the following format as specified in 5. Submission.

Submission

We use GitHub Classroom to collect assignments from students. For instructions on how to submit assignments, please refer to the link. Push your code to Github before the deadline.

Reference

Further references about Rust (not μRust)