Compilers - Montana State University

CompilersCSCI 468Spring 2018

Mitch BakerTyler SassamanSteven Thompson

Section 1: Program

For our compiler we had many different files, most of them were automatically generated by ANTLR.However, we created/edited the following files:

• Little.g4 : the grammar specification file for the toy LITTLE language.

• Driver.java: the main file in the system, it gathers the input program and begins the compiling process.

• Symbol.java: the object that we stored in our symbol tables.

• SymbolTable.java the class which allows us to create and store symbol tables.

• Listener.java: our implementation of the ANTLR Listener interface.

• IRBuilder.g4 : our implementation of the intermediate instruction step.

• TinyBuilder.g4 : our class which converted the intermediate instructions to TINY instructions.

Little.g4

grammar L i t t l e ;

/∗ Program ∗/program : ’PROGRAM’ id ’BEGIN’ pgm body ’END’ ;id : IDENTIFIER ;pgm body : dec l f u n c d e c l a r a t i o n s ;de c l : s t r i n g d e c l de c l | v a r d e c l de c l | /∗empty ∗/ ;

/∗ Global S t r ing Dec la ra t i on ∗/s t r i n g d e c l : ’STRING’ id ’ := ’ s t r ’ ; ’ ;s t r : STRINGLITERAL;

/∗ Var iab le Dec la ra t i on ∗/v a r d e c l : var type i d l i s t ’ ; ’ ;var type : ’FLOAT’ | ’ INT ’ ;any type : var type | ’VOID’ ;i d l i s t : id i d t a i l ;i d t a i l : ’ , ’ id i d t a i l | /∗empty ∗/ ;

/∗ Function Paramater L i s t ∗/p a r a m d e c l l i s t : param decl p a r a m d e c l t a i l | /∗empty ∗/ ;param decl : var type id ;p a r a m d e c l t a i l : ’ , ’ param decl p a r a m d e c l t a i l | /∗empty ∗/ ;

/∗ Function Dec l a ra t i on s ∗/f u n c d e c l a r a t i o n s : f u n c d e c l f u n c d e c l a r a t i o n s | /∗empty ∗/ ;f u n c d e c l : ’FUNCTION’ any type id ’ ( ’ p a r a m d e c l l i s t ’ ) ’ ’BEGIN’ func body ’END’ ;func body : de c l s t m t l i s t ;

/∗ Statement L i s t ∗/s t m t l i s t : stmt s t m t l i s t | /∗empty ∗/ ;stmt : base stmt | i f s t m t | whi l e s tmt ;base stmt : a s s i gn s tmt | read stmt | wr i t e s tmt | r e turn s tmt ;

/∗ Basic Statements ∗/as s i gn s tmt : a s s i g n e x p r ’ ; ’ ;a s s i g n e x p r : id ’ := ’ expr ;

1

read stmt : ’READ’ ’ ( ’ i d l i s t ’ ) ’ ’ ; ’ ;wr i t e s tmt : ’WRITE’ ’ ( ’ i d l i s t ’ ) ’ ’ ; ’ ;r e turn s tmt : ’RETURN’ expr ’ ; ’ ;

/∗ Expres s ions ∗/expr : e x p r p r e f i x f a c t o r ;e x p r p r e f i x : e x p r p r e f i x f a c t o r addop | /∗empty ∗/ ;f a c t o r : f a c t o r p r e f i x p o s t f i x e x p r ;f a c t o r p r e f i x : f a c t o r p r e f i x p o s t f i x e x p r mulop | /∗empty ∗/ ;p o s t f i x e x p r : primary | c a l l e x p r ;c a l l e x p r : id ’ ( ’ e x p r l i s t ’ ) ’ ;e x p r l i s t : expr e x p r l i s t t a i l | /∗empty ∗/ ;e x p r l i s t t a i l : ’ , ’ expr e x p r l i s t t a i l | /∗empty ∗/ ;primary : ’ ( ’ expr ’ ) ’ | id | INTLITERAL | FLOATLITERAL;addop : ’+ ’ | ’− ’ ;mulop : ’∗ ’ | ’ / ’ ;

/∗ Complex Statements and Condit ion ∗/i f s t m t : ’ IF ’ ’ ( ’ cond ’ ) ’ d e c l s t m t l i s t e l s e p a r t ’ENDIF ’ ;e l s e p a r t : ’ELSE’ dec l s t m t l i s t | /∗empty ∗/ ;cond : expr compop expr ;compop : ’< ’ | ’> ’ | ’= ’ | ’ != ’ | ’<=’ | ’>=’;

/∗ While statements ∗/whi l e s tmt : ’WHILE’ ’ ( ’ cond ’ ) ’ d e c l s t m t l i s t ’ENDWHILE’ ;

tokens : .∗? EOF;KEYWORD : ( ’PROGRAM’ | ’ BEGIN’ | ’FUNCTION’ | ’READ’ | ’WRITE’ | ’ ELSE ’ | ’ ENDIF ’ | ’ENDWHILE’ |

’ FI ’ | ’FOR’ | ’ROF’ | ’RETURN’ | ’ INT ’ | ’ VOID’ | ’ STRING’ | ’FLOAT’ | ’END’ | ’ IF ’ | ’WHILE’ ) ;STRINGLITERAL: ’ ” ’ ( . ∗ ? ) ’ ” ’ ;IDENTIFIER : [ a−zA−Z ] [ a−zA−Z0−9]∗ ;INTLITERAL: [0−9]+;FLOATLITERAL: [0 −9 ]∗ ’ . ’ [ 0 −9 ]+;OPERATOR: ( ’ : = ’ | ’ + ’ | ’ − ’ | ’∗ ’ | ’ / ’ | ’ = ’ | ’ ! = ’ | ’ < ’ | ’ > ’ | ’ ( ’ | ’ ) ’ | ’ ; ’ | ’ , ’ | ’ <= ’ | ’ >= ’ ) ;COMMENT: (’−− ’ ˜ [\n\ r ] ∗ ) −>sk ip ;WS: [ \ r \n\ t ]+ −> sk ip ;

Driver.java

import org . a n t l r . v4 . runtime . ∗ ;import org . a n t l r . v4 . runtime . t r e e . ∗ ;import java . i o . ∗ ;import java . u t i l . ArrayList ;

pub l i c c l a s s Dr iver {pub l i c s t a t i c void main ( St r ing [ ] a rgs ) throws Exception {

St r ing t e s t c a s e f i l e n a m e = args [ 0 ] ;t ry {CharStream stream = CharStreams . fromFileName ( t e s t c a s e f i l e n a m e ) ;L i t t l e L e x e r l e x e r = new L i t t l e L e x e r ( stream ) ;

CommonTokenStream tokens = new CommonTokenStream( l e x e r ) ;L i t t l e P a r s e r par s e r = new L i t t l e P a r s e r ( tokens ) ;

2

ParseTreeWalker walker = new ParseTreeWalker ( ) ;L i s t e n e r l i s t e n = new L i s t e n e r ( ) ;walker . walk ( l i s t e n , pa r s e r . program ( ) ) ;

} catch ( FileNotFoundException ex ) {System . out . p r i n t l n (” Please ente r a v a l i d f i l e name . ” ) ;

} catch ( IOException ex ) {System . out . p r i n t l n (” There was an er ror , s o r ry . ” ) ;

}}

}

Symbol.java

pub l i c c l a s s Symbol {

p r i v a t e St r ing type ;p r i v a t e St r ing name ;p r i v a t e St r ing value ;

pub l i c Symbol ( S t r ing type , S t r ing name) {t h i s . type = type ;t h i s . name = name ;

}

pub l i c Symbol ( S t r ing type , S t r ing name , S t r ing value ) {t h i s . type = type ;t h i s . name = name ;t h i s . va lue = value ;

}

pub l i c S t r ing getType ( ) { re turn type ; }pub l i c S t r ing getName ( ) { re turn name ; }pub l i c S t r ing getValue ( ) { re turn value ; }pub l i c void p r in t ( ) {

System . out . p r i n t l n ( t oS t r i ng ( ) ) ;}

@Overridepub l i c S t r ing toS t r i ng ( ) {

St r ing p r in t = ”name ” + name + ” type ” + type ;i f ( va lue != n u l l ) {

pr in t += ” value ” + value ;}re turn p r in t ;

}

SymbolTable.java

import java . u t i l . ∗ ;

pub l i c c l a s s SymbolTable {

3

p r i v a t e S t r ing name ;p r i v a t e HashMap<Str ing , Symbol> symbols ;p r i v a t e ArrayList<Symbol> inOrder ;p r i v a t e HashMap<Str ing , Symbol> ancestorSymbols ;p r i v a t e SymbolTable parent ;p r i v a t e ArrayList<SymbolTable> c h i l d r e n ;

pub l i c SymbolTable ( SymbolTable parent , HashMap<Str ing , Symbol> ancestorSymbols ){t h i s . parent = parent ;t h i s . ancestorSymbols =

( ancestorSymbols == n u l l ) ? new HashMap<>() : ancestorSymbols ;symbols = new HashMap<>();inOrder = new ArrayList <>();c h i l d r e n = new ArrayList <>();

}

pub l i c void setName ( St r ing name) {t h i s . name = name ;

}

pub l i c void addSymbol ( Symbol symbol ) {i f ( ! e x i s t s ( symbol ) ) {

symbols . put ( symbol . getName ( ) , symbol ) ;inOrder . add ( symbol ) ;

} e l s e {System . out . p r i n t l n (”DECLARATION ERROR ” + symbol . getName ( ) ) ;System . e x i t ( 1 ) ;

}}

pub l i c void addChild ( SymbolTable t ab l e ) { c h i l d r e n . add ( t a b l e ) ; }

pub l i c SymbolTable c r ea t eCh i ld ( ) {SymbolTable c h i l d = new SymbolTable ( th i s , packageSymbols ( ) ) ;addChild ( c h i l d ) ;r e turn c h i l d ;

}

pub l i c SymbolTable getParent ( ) { re turn parent ;}

pub l i c void pr intTable ( ) {System . out . p r i n t l n (” Symbol t a b l e ” + name ) ;f o r ( Symbol symbol : inOrder ) {

symbol . p r i n t ( ) ;}

}

pub l i c void p r i n t A l l ( ) {pr intTable ( ) ;f o r ( SymbolTable t a b l e : c h i l d r e n ) {

System . out . p r i n t l n ( ) ; // l i n e spacet a b l e . p r i n t A l l ( ) ;

}}

4

p r i v a t e boolean e x i s t s ( Symbol symbol ) {i f ( symbols . containsKey ( symbol . getName ( ) ) ) {

re turn true ;} e l s e {

re turn f a l s e ;}

}

p r i v a t e HashMap<Str ing , Symbol> packageSymbols ( ) {HashMap<Str ing , Symbol> temp = new HashMap<>();temp . putAl l ( symbols ) ;temp . putAl l ( ancestorSymbols ) ;r e turn temp ;

}

pub l i c Symbol searchSymbol ( S t r ing name) {HashMap<Str ing , Symbol> a l lSymbols = packageSymbols ( ) ;r e turn a l lSymbols . get (name ) ;

}}

Listener.java

import java . u t i l . ArrayList ;import java . u t i l . HashMap ;import org . a n t l r . v4 . runtime . Parser ;import org . a n t l r . v4 . runtime . t r e e . ParseTree ;

pub l i c c l a s s L i s t e n e r extends L i t t l e B a s e L i s t e n e r {

SymbolTable s ;IRBui lder i r ;Symbol symbol ;boolean newTable , newTableHeader , programHeader = f a l s e ;S t r ing variableType ,

var iab l eVa lue ;ArrayList<Str ing> variableName = new ArrayList <>();i n t b lock = 1 ;i n t expres s ionCept ion = 0 ;

@Overridepub l i c void enterProgram ( L i t t l e P a r s e r . ProgramContext ctx ) {

s = new SymbolTable ( nu l l , n u l l ) ;i r = new IRBui lder ( s ) ;s . setName (”GLOBAL” ) ;programHeader = true ;

}

@Overridepub l i c void exitProgram ( L i t t l e P a r s e r . ProgramContext ctx ) {

i r . endProgram ( ) ;}

5

@Overridepub l i c void ente rFunc dec l ( L i t t l e P a r s e r . Func declContext ctx ) {

pushSymbolTable ( ) ;newTable = true ;newTableHeader = true ;// todo : a l low f o r mu l t ip l e func t i on d e c l a r a t i o n si r . enterMain ( ) ;

}

@Overridepub l i c void enterFunc body ( L i t t l e P a r s e r . Func bodyContext ctx ) {

newTableHeader = f a l s e ;}

@Overridepub l i c void ex i tFunc dec l ( L i t t l e P a r s e r . Func declContext ctx ) {

popSymbolTable ( ) ;}

@Overridepub l i c void ente rVar dec l ( L i t t l e P a r s e r . Var declContext ctx ) {

variableName . c l e a r ( ) ;}

@Overridepub l i c void e x i t V a r d e c l ( L i t t l e P a r s e r . Var declContext ctx ) {

f o r ( S t r ing name : variableName ) {s . addSymbol (new Symbol ( variableType , name ) ) ;

}var iableType = n u l l ;variableName . c l e a r ( ) ;

}

@Override pub l i c void enterVar type ( L i t t l e P a r s e r . Var typeContext ctx ) {var iableType = ctx . getText ( ) ;

}

@Overridepub l i c void ente r Id ( L i t t l e P a r s e r . IdContext ctx ) {

i f ( newTable ) {s . setName ( ctx . getText ( ) ) ;variableName . c l e a r ( ) ;newTable = f a l s e ;

} e l s e i f ( newTableHeader )s . addSymbol (new Symbol ( variableType , ctx . getText ( ) ) ) ;var iableType = n u l l ;

} e l s e i f ( programHeader ) {programHeader = f a l s e ;

} e l s e {variableName . add ( ctx . getText ( ) ) ;

}}

@Override

6

pub l i c void e x i t I d ( L i t t l e P a r s e r . IdContext ctx ) {}

@Overridepub l i c void e n t e r S t r i n g d e c l ( L i t t l e P a r s e r . S t r ing dec lContex t ctx ) {

var iableType = ”STRING” ;}

@Overridepub l i c void e x i t S t r i n g d e c l ( L i t t l e P a r s e r . S t r ing dec lContex t ctx ) {

s . addSymbol (new Symbol ( variableType ,variableName . get ( variableName . s i z e ( ) − 1) , var i ab l eVa lue ) ) ;variableName . c l e a r ( ) ;var iableType = n u l l ;va r i ab l eVa lue = n u l l ;

}

@Overridepub l i c void en t e rS t r ( L i t t l e P a r s e r . StrContext ctx ) {

var iab l eVa lue = ctx . getText ( ) ;}

@Overridepub l i c void ente rAss i gn expr ( L i t t l e P a r s e r . Ass ign exprContext ctx ) {

i n t count = ctx . getChildCount ( ) ;}

@Overridepub l i c void enterExpr ( L i t t l e P a r s e r . ExprContext ctx ) {

i r . ente rExpre s s i on ( ) ;expres s ionCept ion++;

}

@Overridepub l i c void exitExpr ( L i t t l e P a r s e r . ExprContext ctx ) {

i f (−−expres s ionCept ion == 0){i r . e x i tExpre s s i on ( t rue ) ;

} e l s e {i r . e x i tExpre s s i on ( f a l s e ) ;

}}

@Overridepub l i c void e n t e r Ex pr p r e f i x ( L i t t l e P a r s e r . Expr pre f ixContext ctx ){}

@Overridepub l i c void enterFactor ( L i t t l e P a r s e r . FactorContext ctx ) {

i n t count = ctx . getChildCount ( ) ;}

@Overridepub l i c void e n t e r P o s t f i x e x p r ( L i t t l e P a r s e r . Pos t f ix exprContext ctx ){}

@Override

7

pub l i c void enterPrimary ( L i t t l e P a r s e r . PrimaryContext ctx ){i f ( ctx . getChildCount ( ) == 1){

i r . addElement ( ctx . getText ( ) ) ;}

}

@Overridepub l i c void enterAddop ( L i t t l e P a r s e r . AddopContext ctx ) {

i r . addOperator ( ctx . getText ( ) ) ;}

@Overridepub l i c void enterMulop ( L i t t l e P a r s e r . MulopContext ctx ){

i r . addOperator ( ctx . getText ( ) ) ;}

@Overridepub l i c void ex i tAs s i gn exp r ( L i t t l e P a r s e r . Ass ign exprContext ctx ){

i r . ass ignmentStatement ( ctx . getChi ld ( 0 ) . getText ( ) ) ;}

@Overridepub l i c void exitCond ( L i t t l e P a r s e r . CondContext ctx ) {

i n t count = ctx . getChildCount ( ) ;S t r ing [ ] s e t = new St r ing [ count ] ;f o r ( i n t i = 0 ; i < count ; i++) {

s e t [ i ] = ctx . getChi ld ( i ) . getText ( ) ;}i r . parseComparison ( s e t ) ;

}

@Overridepub l i c void e n t e r I f s t m t ( L i t t l e P a r s e r . I f s tmtContext ctx ) {

pushSymbolTable ( ) ;s . setName (”BLOCK ” + block ) ;b lock++;i r . s e tCond i t i on (” i f ” ) ;

}

@Overridepub l i c void e x i t I f s t m t ( L i t t l e P a r s e r . I f s tmtContext ctx ) {

popSymbolTable ( ) ;i r . e x i t I f ( ) ;i r . s e tCond i t i on ( n u l l ) ;

}

@Overridepub l i c void e n t e r E l s e p a r t ( L i t t l e P a r s e r . E l se partContext ctx ) {i f ( ctx . getChildCount ( ) > 0) {

popSymbolTable ( ) ;pushSymbolTable ( ) ;s . setName (”BLOCK ” + block ) ;b lock++;i r . en te rE l s ePar t ( ) ;

8

}}

@Overridepub l i c void e x i t E l s e p a r t ( L i t t l e P a r s e r . E l se partContext ctx ) {}

@Overridepub l i c void enterWhi le stmt ( L i t t l e P a r s e r . While stmtContext ctx ) {

pushSymbolTable ( ) ;s . setName (”BLOCK ” + block ) ;b lock++;i r . s e tCond i t i on (” whi l e ” ) ;

}

@Overridepub l i c void ex i tWhi le s tmt ( L i t t l e P a r s e r . While stmtContext ctx ) {

popSymbolTable ( ) ;i r . endWhile ( ) ;i r . s e tCond i t i on ( n u l l ) ;

}

@Overridepub l i c void enterWrite stmt ( L i t t l e P a r s e r . Write stmtContext ctx ) {

St r ing [ ] params = ctx . getChi ld ( 2 ) . getText ( ) . s p l i t ( ” , ” ) ;i r . bui ldWrite ( params ) ;

}

@Overridepub l i c void enterRead stmt ( L i t t l e P a r s e r . Read stmtContext ctx ) {

St r ing [ ] params = ctx . getChi ld ( 2 ) . getText ( ) . s p l i t ( ” , ” ) ;i r . buildRead ( params ) ;

}

pub l i c SymbolTable getSymbolTable ( ) {re turn s ;

}

pub l i c void popSymbolTable ( ) {s = s . getParent ( ) ;i r . updateTable ( s ) ;

}

pub l i c void pushSymbolTable ( ) {s = s . c r ea t eCh i ld ( ) ;i r . updateTable ( s ) ;

}}

IRBuilder.java


pub l i c c l a s s IRBui lder {

9

p r i v a t e ArrayList<Str ing> i r l i s t ;p r i v a t e S t r ing cond i t i on ;p r i v a t e i n t labelNum ;p r i v a t e ArrayList<Str ing> l a b e lS t a c k ;p r i v a t e i n t regNum ;p r i v a t e S t r ing compIR ;p r i v a t e char dataType ;p r i v a t e SymbolTable currentTable ;p r i v a t e ArrayList<Str ing> s tack ;p r i v a t e ArrayList<Str ing> post f ixOutput ;p r i v a t e boolean inExpres s i on = f a l s e ;

pub l i c IRBui lder ( SymbolTable currentTable ) {i r l i s t = new ArrayList <>();labelNum = 1 ;regNum = 1 ;t h i s . currentTable = currentTable ;l a b e lS t a c k = new ArrayList <>();s tack = new ArrayList <>();post f ixOutput = new ArrayList <>();

}

pub l i c void updateTable ( SymbolTable s ) {currentTable = s ;

}p r i v a t e void c l ean ( ) {

cond i t i on = n u l l ;compIR = n u l l ;dataType = ’ ’ ;s tack . c l e a r ( ) ;post f ixOutput . c l e a r ( ) ;

}

p r i v a t e void tr imStack ( ) {ArrayList<Str ing> to t r im = new ArrayList<>(Arrays . a s L i s t ( ” ( ” , ” ) ” ) ) ;s tack . removeAll ( t o t r im ) ;

}

p r i v a t e void p r i n t L i s t ( ArrayList<Str ing> l i s t , boolean v e r t i c a l , boolean commented ){St r ing o r i e n t = v e r t i c a l ? ”\n” : ” ” ;S t r ing comment = commented ? ” ;” : ”” ;f o r ( S t r ing e l : l i s t ) {

System . out . p r i n t ( comment + e l + o r i e n t ) ;}i f ( ! v e r t i c a l ){

System . out . p r i n t l n ( ) ;}

}

p r i v a t e boolean isNumber ( S t r ing numberMaybe) {re turn numberMaybe . matches (”−?\\d∗\\ . ?\\d+”);

}

p r i v a t e boolean i s R e g i s t e r ( S t r ing numberMaybe) {

10

re turn numberMaybe . matches (”\\$T\\d+”);}

pub l i c void enterMain ( ) {i r l i s t . add (”LABEL main ” ) ;i r l i s t . add (”LINK” ) ;

}pub l i c void endProgram ( ) {

i r l i s t . add (”RET” ) ;System . out . p r i n t l n ( ” ; IR code ” ) ;p r i n t L i s t ( i r l i s t , true , t rue ) ;System . out . p r i n t l n ( ” ; t iny code ” ) ;TinyBui lder bu i ld = new TinyBui lder ( i r l i s t , currentTable ) ;bu i ld . p a r s e I r L i s t ( ) ;bu i ld . p r i n t ( t rue ) ;

}

pub l i c void bui ldWrite ( S t r ing [ ] params ) {char type ;f o r ( S t r ing param : params ) {

type = currentTable . searchSymbol ( param ) . getType ( ) .toUpperCase ( ) . toCharArray ( ) [ 0 ] ;

i r l i s t . add (”WRITE” + type + ” ” + param ) ;}

}

pub l i c void buildRead ( St r ing [ ] params ) {char type ;f o r ( S t r ing param : params ) {

type = currentTable . searchSymbol ( param ) . getType ( ) .toUpperCase ( ) . toCharArray ( ) [ 0 ] ;

i r l i s t . add (”READ” + type + ” ” + param ) ;}

}

pub l i c void parseComparison ( St r ing [ ] s e t ) {i f ( isNumber ( s e t [ 2 ] ) ) {

dataType = s e t [ 2 ] . conta in s ( ” . ” ) ? ’F ’ : ’ I ’ ;} e l s e { // get the data type from the v a r i a b l e

dataType = currentTable . searchSymbol ( s e t [ 0 ] ) . getType ( ) .toUpperCase ( ) . toCharArray ( ) [ 0 ] ;}St r ing op1 = elementToIR ( s e t [ 0 ] ) ;S t r ing op2 = elementToIR ( s e t [ 2 ] ) ;tr imStack ( ) ;i f ( ! isNumber ( op1 ) && ! i s R e g i s t e r ( op1 ) && currentTable .searchSymbol ( op1 ) == n u l l ) {

op1 = stack . get ( 0 ) ;}i f ( ! isNumber ( op2 ) && ! i s R e g i s t e r ( op2 ) && currentTable .searchSymbol ( op2 ) == n u l l ) {

op2 = stack . get (1 % stack . s i z e ( ) ) ;}setCompop ( s e t [ 1 ] ) ;

11

routeCondit ion ( op1 , op2 ) ;}

pub l i c void se tCond i t i on ( S t r ing cond ) {t h i s . c ond i t i on = cond ;

}

pub l i c void routeCondit ion ( S t r ing op1 , S t r ing op2 ) throws Nul lPo interExcept ion {i f ( c ond i t i on == n u l l ){

throw new Nul lPo interExcept ion (” Condit ion value i s n u l l ” ) ;} e l s e i f ( cond i t i on == ” i f ”){

bu i ld I fHeader ( op1 , op2 ) ;} e l s e {

buildWhileHeader ( op1 , op2 ) ;}c l ean ( ) ;

}

p r i v a t e void setCompop ( St r ing compop) {switch ( compop) {

case ”<”:compIR = ”GE” ;break ;

case ”<=”:compIR = ”GT” ;break ;

case ”>”:compIR = ”LE” ;break ;

case ”>=”:compIR = ”LT” ;break ;

case ”=”:compIR = ”NE” ;break ;

case ”!=”:compIR = ”EQ” ;break ;

d e f a u l t :// s h i t s broke yocompIR = n u l l ;

}compIR += dataType ; //append the dataType f l a g ?

}

p r i v a t e void buildWhileHeader ( S t r ing op1 , S t r ing op2 ) {i r l i s t . add (”LABEL l a b e l ” + labelNum++);l a b e lS t a c k . add (0 ,” l a b e l ” + labelNum ) ;l a b e lS t a c k . add (0 ,” l a b e l ” + ( labelNum − 1 ) ) ;i r l i s t . add (compIR + ” ” + op1 + ” ” + op2 + ” l a b e l ” + labelNum++);

}pub l i c void endWhile ( ) {

St r ing op1 = ”JUMP ” + l a b e l S t a c k . remove ( 0 ) ;S t r ing op2 = ”LABEL ” + l a b e l S t a ck . remove ( 0 ) ;

12

i r l i s t . add ( op1 ) ;i r l i s t . add ( op2 ) ;

}

p r i v a t e void bu i ld I fHeader ( S t r ing op1 , S t r ing op2 ) {l a b e lS t a c k . add (0 ,” l a b e l ” + labelNum ) ;i r l i s t . add (compIR + ” ” + op1 + ” ” + op2 + ” l a b e l ” + labelNum++);

}

pub l i c void ente rE l s ePar t ( ) {St r ing e l s e L a b e l = l a b e l S t a c k . remove ( 0 ) ;l a b e lS t a c k . add (0 , ” l a b e l ” + labelNum ) ;i r l i s t . add (”JUMP l a b e l ” + labelNum++);i r l i s t . add (”LABEL ” + e l s e L a b e l ) ;

}pub l i c void e x i t I f ( ) {

St r ing e l s e L a b e l = l a b e l S t a c k . remove ( 0 ) ;i r l i s t . add (”LABEL ” + e l s e L a b e l ) ;

}

pub l i c void ente rExpre s s i on ( ) {i nExpres s i on = true ;s tack . add ( 0 , ” ( ” ) ;

}

pub l i c void ex i tExpre s s i on ( boolean end ) {s tack . add ( 0 , ” ) ” ) ;popPostFixStack ( end ) ;

}

pub l i c void addElement ( S t r ing element ) {i f ( ! inExpres s i on ) re turn ;post f ixOutput . add ( element ) ;

}pub l i c void addOperator ( S t r ing op ) {

i f ( ! inExpres s i on ) re turn ;S t r ing post f ixOp ;switch ( op ) {

case ”+”:case ”−”:

i f ( s tack . isEmpty ( ) | | s tack . get (0 ) == ”(”)s tack . add (0 , op ) ;

e l s e {post f ixOp = stack . remove ( 0 ) ;post f ixOutput . add ( post f ixOp ) ;addOperator ( op ) ; //oh boy . . .}break ;

case ”∗” :case ”/” :

i f ( s tack . get (0 ) == ”∗” | | s tack . get (0 ) == ”/”) {post f ixOp = stack . remove ( 0 ) ;post f ixOutput . add ( post f ixOp ) ;

13

addOperator ( op ) ; //oh boy . . .} e l s e { // f o r ”+”,”−”, or ”(”

s tack . add (0 , op ) ;}break ;

d e f a u l t :System . out . p r i n t l n (”−−”);

}}

pub l i c void p r i n t P o s t f i x ( ) {System . out . p r i n t (” Stack = ” ) ;S t r ing stacky = ”” ;f o r ( S t r ing e l : s tack ) {

stacky = e l + ” ” + stacky ;}System . out . p r i n t l n ( stacky ) ;System . out . p r i n t (” Output = ” ) ;f o r ( S t r ing e l : post f ixOutput ) {

System . out . p r i n t ( e l + ” ” ) ;}System . out . p r i n t l n ( ) ;

}

p r i v a t e void popPostFixStack ( boolean complete ly ) {i f ( complete ly ) {

i f ( post f ixOutput . s i z e ( ) == 1){ s tack . add (0 , post f ixOutput . get ( 0 ) ) ; }e l s e {i nExpres s i on = f a l s e ;whi l e ( ! s tack . isEmpty ( ) ) {

i f ( s tack . get (0 ) == ”(” | | s tack . get (0 ) == ”)”) {s tack . remove ( 0 ) ;cont inue ;

}post f ixOutput . add ( s tack . remove ( 0 ) ) ;

}exprToIR ( ) ;}post f ixOutput . c l e a r ( ) ;

} e l s e {s tack . remove ( 0 ) ;whi l e ( s tack . get (0 ) != ”(”) {

post f ixOutput . add ( s tack . remove ( 0 ) ) ;}s tack . remove ( 0 ) ;

}}

p r i v a t e S t r ing elementToIR ( St r ing e l ) {i f ( isNumber ( e l ) ) {

dataType = e l . conta in s ( ” . ” ) ? ’F ’ : ’ I ’ ;i r l i s t . add ( St r ing . format (”STORE%c %s $T%d” , dataType , e l , regNum++));e l = ”$T” + (regNum−1);

}

14

re turn e l ;}

p r i v a t e void exprToIR ( ) {s tack . c l e a r ( ) ;i f ( isNumber ( post f ixOutput . get ( 0 ) ) ) {

dataType = post f ixOutput . get ( 0 ) . conta in s ( ” . ” ) ? ’F ’ : ’ I ’ ;} e l s e {

dataType = currentTable . searchSymbol ( post f ixOutput . get ( 0 ) ). getType ( ) . toUpperCase ( ) . toCharArray ( ) [ 0 ] ;

}St r ing a ;S t r ing b ;f o r ( S t r ing e l : post f ixOutput ) {

i f ( e l . equa l s ( ”+”) | | e l . equa l s (” −”) | | e l . equa l s ( ”∗” ) | e l . equa l s (”/”) ){b = elementToIR ( s tack . remove ( 0 ) ) ;a = elementToIR ( stack . remove ( 0 ) ) ;switch ( e l ) {

case ”+”:s tack . add (0 ,”$T”+regNum ) ;i r l i s t . add ( St r ing . format (”ADD%c %s %s $T%d” ,

dataType , a , b , regNum++));break ;

case ”−”:s tack . add (0 ,”$T”+regNum ) ;i r l i s t . add ( St r ing . format (”SUB%c %s %s $T%d” ,


case ”∗” :s tack . add (0 ,”$T”+regNum ) ;i r l i s t . add ( St r ing . format (”MUL%c %s %s $T%d” ,


case ”/” :s tack . add (0 ,”$T”+regNum ) ;i r l i s t . add ( St r ing . format (”DIV%c %s %s $T%d” ,


d e f a u l t :System . out . p r i n t l n (” Sh i t s broke yo . . . ” ) ;

}} e l s e {

s tack . add (0 , e l ) ;}

}}

pub l i c void assignmentStatement ( St r ing v a r i a b l e ) {dataType = currentTable . searchSymbol ( v a r i a b l e ) .getType ( ) . toUpperCase ( ) . toCharArray ( ) [ 0 ] ;S t r ing value = elementToIR ( stack . get ( 0 ) ) ;i r l i s t . add ( St r ing . format (”STORE%c %s %s ” , dataType , value , v a r i a b l e ) ) ;c l ean ( ) ;

}

15

}

TinyBuilder.java


pub l i c c l a s s TinyBui lder {

p r i v a t e ArrayList<Str ing> i r l i s t ;p r i v a t e SymbolTable t a b l e ;p r i v a t e ArrayList<Str ing> t iny ;p r i v a t e ArrayList<Str ing> vars ;p r i v a t e i n t regNum ;p r i v a t e char dataType ;

pub l i c TinyBui lder ( ArrayList<Str ing> i r l i s t , SymbolTable t ab l e ) {t h i s . i r l i s t = i r l i s t ;t h i s . t a b l e = t a b l e ;t iny = new ArrayList <>();vars = new ArrayList <>();regNum = 0 ; // f o r s t o r i n g the temps

}

pub l i c void p r in t ( boolean v e r t i c a l ) {p r i n t L i s t ( t iny , v e r t i c a l ) ;

}

p r i v a t e void p r i n t L i s t ( ArrayList<Str ing> l i s t , boolean v e r t i c a l ) {St r ing o r i e n t = v e r t i c a l ? ”\n” : ” ” ;f o r ( S t r ing e l : l i s t ) {

System . out . p r i n t ( e l + o r i e n t ) ;}System . out . p r i n t l n ( ) ;

}

pub l i c void p a r s e I r L i s t ( ) {f o r ( S t r ing opcode : i r l i s t ) {

checkForVar iable ( opcode ) ;opcode = opcode . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = opcode . s p l i t (” ” ) ;dataType = ops [ 0 ] . toLowerCase ( ) . charAt ( ops [ 0 ] . l ength ( ) − 1 ) ;dataType = dataType == ’ f ’ ? ’ r ’ : dataType ;

i f ( opcode . conta in s (”LABEL”) ) parseLabe l ( opcode ) ;e l s e i f ( opcode . conta in s (”STORE”)) par s eSto re ( opcode ) ;e l s e i f ( opcode . conta in s (”ADD”)) parseAdd ( opcode ) ;e l s e i f ( opcode . conta in s (”SUB”)) parseSub ( opcode ) ;e l s e i f ( opcode . conta in s (”MUL”)) parseMul ( opcode ) ;e l s e i f ( opcode . conta in s (”DIV”) ) parseDiv ( opcode ) ;e l s e i f ( opcode . conta in s (”JUMP”)) parseJump ( opcode ) ;e l s e i f ( opcode . conta in s (”LINK”)) {}e l s e i f ( opcode . conta in s (”READ”)) parseRead ( opcode ) ;e l s e i f ( opcode . conta in s (”WRITE”) ) parseWrite ( opcode ) ;e l s e i f ( opcode . conta in s (”RET”)) parseRet ( opcode ) ;

16

e l s e parseComp ( opcode ) ;}

}

p r i v a t e void checkForVar iable ( S t r ing opcode ) {St r ing [ ] ops = opcode . s p l i t (” ” ) ;f o r ( S t r ing p o t e n t i a l : ops ) {

Symbol sym = ta b l e . searchSymbol ( p o t e n t i a l ) ;i f ( sym != n u l l && ! vars . conta in s ( p o t e n t i a l ) ) {

foundVar iable (sym ) ;}

}}p r i v a t e void foundVar iable ( Symbol var ) {

St r ing v a r i a b l e = var . getName ( ) ;S t r ing type = ” var ” ;vars . add ( v a r i a b l e ) ;i f ( var . getType ( ) . equa l s (”STRING”)) {

v a r i a b l e += Str ing . format (” %s ” , var . getValue ( ) ) ;type = ” s t r ” ;

}t iny . add (0 , S t r ing . format(”%s %s ” , type , v a r i a b l e ) ) ;

}

p r i v a t e void parseLabe l ( S t r ing code ) {St r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (” l a b e l %s ” , ops [ 1 ] ) ) ;

}

p r i v a t e void par s eSto re ( S t r ing code ) {code = code . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = code . s p l i t (” ” ) ;i f ( ! ops [ 1 ] . matches (” r \\d+”) && ! ops [ 2 ] . matches (” r \\d+”)) {

t iny . add ( St r ing . format (”move %s r0 ” , ops [ 1 ] ) ) ;t iny . add ( St r ing . format (”move r0 %s ” , ops [ 2 ] ) ) ;

} e l s e {t iny . add ( St r ing . format (”move %s %s ” , ops [ 1 ] , ops [ 2 ] ) ) ;

}}

p r i v a t e void parseComp ( St r ing code ) {code = code . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = code . s p l i t (” ” ) ;S t r ing compop = ” j ” + ( ( St r ing ) ops [ 0 ] . subSequence (0 , ops [ 0 ] .l ength ( ) − 1 ) ) . toLowerCase ( ) ;i f ( ! ops [ 1 ] . matches (” r \\d+”) && ! ops [ 2 ] . matches (” r \\d+”)) {

t iny . add ( St r ing . format (”move %s r0 ” , ops [ 2 ] ) ) ;t iny . add ( St r ing . format (”cmp%c %s r0 ” , dataType , ops [ 1 ] ) ) ;

} e l s e {t iny . add ( St r ing . format (”cmp%c %s %s ” , dataType , ops [ 1 ] , ops [ 2 ] ) ) ;

}t iny . add ( St r ing . format(”%s %s ” , compop , ops [ 3 ] ) ) ;

}

17

p r i v a t e void parseAdd ( St r ing code ) {code = code . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (”move %s %s ” , ops [ 1 ] , ops [ 3 ] ) ) ;t iny . add ( St r ing . format (” add%c %s %s ” , dataType , ops [ 2 ] , ops [ 3 ] ) ) ;

}

p r i v a t e void parseSub ( St r ing code ) {code = code . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (”move %s %s ” , ops [ 1 ] , ops [ 3 ] ) ) ;t iny . add ( St r ing . format (” sub%c %s %s ” , dataType , ops [ 2 ] , ops [ 3 ] ) ) ;

}

p r i v a t e void parseMul ( S t r ing code ) {code = code . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (”move %s %s ” , ops [ 1 ] , ops [ 3 ] ) ) ;t iny . add ( St r ing . format (”mul%c %s %s ” , dataType , ops [ 2 ] , ops [ 3 ] ) ) ;

}

p r i v a t e void parseDiv ( S t r ing code ) {code = code . r e p l a c e (”$T” ,” r ” ) ;S t r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (”move %s %s ” , ops [ 1 ] , ops [ 3 ] ) ) ;t iny . add ( St r ing . format (” div%c %s %s ” , dataType , ops [ 2 ] , ops [ 3 ] ) ) ;

}

p r i v a t e void parseJump ( St r ing code ) {St r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (” jmp %s ” , ops [ 1 ] ) ) ;

}

p r i v a t e void parseRead ( St r ing code ) {St r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (” sys read%c %s ” , dataType , ops [ 1 ] ) ) ;

}

p r i v a t e void parseWrite ( S t r ing code ) {St r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add ( St r ing . format (” sys wr i t e%c %s ” , dataType , ops [ 1 ] ) ) ;

}

p r i v a t e void parseRet ( S t r ing code ) {St r ing [ ] ops = code . s p l i t (” ” ) ;t iny . add (” sys ha l t ” ) ;

}}

Section 2: Teamwork

For this capstone project we spent a lot of time working, in many different labs and on numerouscomputers. Because of this we used GitHub to maintain our files (as we were developing in NetBeans) as

18

well as serve as a version control platform. We also decided to use Overleaf to write all documents we neededfor this project because it provides version control and access from anywhere. In addition using the LATEXenvironment made some of the formatting and writing easier. During this project the three members of thisteam worked on the project in the following ways:

• Team member 1: Worked on all four steps of the project as well as all the writing for this course.Most of the time was spent working on parts 3 and 4, generating the symbol tables and subsequently thecode generation. This team member was also responsible for revising all the writing before submission.

• Team member 2: Worked on steps 1 and 2 as well as a little on step 3. They also spent some timewriting the report for the respective steps.

• Team member 3: Worked on step 4 with team member 1. In addition they spent most of their timeworking on writing and ensuring the code was up-to-date in our private GitHub repository.

As far as time spent on the project is concerned, if total time was 100% then the breakdown is as follows:

• Team member 1: 80%



Section 3: Design Pattern

This project did not require the creation of numerous objects, beyond the symbol table step, and becauseof that we did not really use a design pattern. One could make the argument the ANTLR uses some form ofthe Factory pattern when it creates the numerous files we used throughout the project. Another argumentcould be made that we used a form of the Factory pattern when we created the Symbol class, as we weregoing to use numerous symbol objects to populate our symbol tables. However, we did not implement the“factory” that is typically called to generate those objects. Rather, we just used constructor calls in ourListener class.

Section 4: Technical Writing

Below is a copy of the final report document we generated for this project.

19

Building a Compiler with JavaMontana State University

Spring 2018

Mitch Baker, Tyler Sassaman,Steven Thompson

April 24, 2018

1 Introduction

The intent of this project is to develop acomplete compiler for the LITTLE program-ming language, which is a toy language basedon the MICRO language defined in Crafting aCompiler in C, written by Charles Fischer andDee-Ann LeBlanc. In short we were able tobuild a compiler which converts the LITTLEcode into an assembly code capable of beingexecuted on a computer.

This project was intended to be done usingthe Oxiago Compiler Factory or OCF, an on-line development environment which supportsJava and Python. Since we are implementingour compiler using Java we were also able toget everything running in the NetBeans IDEwhich provides more functionality than OCF,allowing for easier development.

The following sections will discuss the pur-pose of a compiler, the requirements for a com-piler, our implementation of a compiler, as wellas our thoughts on future work related to ourcompiler.

2 Background

A compiler is a piece of software that en-ables high level languages such as C, C++, andJava to run on computers. Since computersonly understand binary, it makes sense that aprogram written in C, for example, can’t di-rectly run on a computer. Rather, it requiressoftware to translate the higher level language

into an assembly language (often called ma-chine code). While assembly may not be a bi-nary language, it is a language which computerhardware can understand. An additional, andoften overlooked, function of most compilers iscode optimization. As a compiler generates as-sembly code it is able to regenerate the codeallowing for faster and more efficient execution.

Perhaps one of the greatest benefit of com-pilers, when compared to just writing in assem-bly, is the code portability it provides. Sincethe mid to late 20th century when compil-ers began to grow in popularity programmers,have not needed to write architecture specificprograms. Before this time, a program writ-ten to run on an Apple computer would notbe compatible with an IBM machine. Compil-ers allow programmers to write universal pro-grams where the targeted architecture can beadjusted by simply using a different compiler.

A compiler will typically take 5 steps: scan-ning, parsing, symbol table generation, seman-tic routine assignment, and code generation.While these steps will be discussed in furtherdetail later on, it’s important to know thatsome compilers may combine two or three ofthe steps. In our implementation of a com-piler we will combine symbol table generation,semantic routine assignment, and code gener-ation into a single step.

Since we are limited in time we will bebuilding a compiler for a toy programming lan-guage, LITTLE, rather than a full scale pro-gramming language. (We will also be using a

1

pseudo assembly language, TINY, for our fi-nal step.) That being said there are numer-ous compliers which allow the translation ofpopular and not so popular programming lan-guages. Some examples include: the GCCcompiler for C which was developed by theGNU project, the javac OpenJDK compiler de-veloped by Sun Microsystems(now owned byOracle), the PyCharm compiler for Pythonwhich was written in C, the CINT compilerfor C++ developed by CERN, and the Open64compiler for Fortran developed by a coalitionconsisting of Google, HP, Intel, Nvidia, andPathScale amoung others.

3 Methods and Discussion

Building a compiler in any programminglanguage requires numerous steps, each ofthem complicated in their own respects. Eachstep in this process required us to analyze whatwe needed to do, what tools we had at ourdisposal, and what specific structures or ap-proaches we would use in the implementation.In the following discussion we will address therequirements of each step, our approach to im-plementation, any design choices we made, andany issues we encountered.

3.1 Scanner

Scanners, sometimes referred to as Lexers,are the first step in the compiling process. Acompiler takes as input some string of charac-ters and then feeds that input stream to thescanner. The scanner then converts the in-put stream into an output sequence of tokens,this process is called lexical analysis. The se-quence of tokens is generally representative ofthe parts of your language, be it literals, key-words, identifiers, operations, or whatever youneed.

Scanners generally perform the same taskregardless of the compiler or platform you areworking with. In order to build a scanner youneed to know two pieces of information; howyou define a token and how you can recognize

a token from an input stream. One way to de-fine tokens is with regular expressions, somerefer to regular expressions used in a scanneras lexemes. Dr. Randal Nelson at the Univer-sity of Rochester defines a regular expressionto be “an algebraic formula whose value is apattern consisting of a set of strings, called thelanguage of the expression.”[1] Regular expres-sions are used to match tokens as while read-ing from an input stream. To match an inputstream to a regular expression a scanner justreads from the input and compares each newcharacter to the rules. If you have a matchthen it is included as a token, if not, it passesover the unmatched string often throwing anerror (just as you would see in the Java com-piler).

The first step to implementing our scannerwas to write several regular expressions whichrepresent the tokens given in the project de-scription. See Appendix A for a complete list-ing of our regular expressions. Rather thanwriting a scanner from scratch for this project,we utilize a scanner generator. At a high levela scanner generator is a tool which takes agrammar file as input, containing token defi-nitions, and generates a scanner based on thegrammar. Since we are using Java to writeour compiler, we will be using Another Toolfor Language Recognition or ANTLR (version4.7.1), a tool that is able to generate scannersas well as parsers.

After generating the grammar file, we hadto edit a Java driver file, which takes in-put from a file (representative of the LITTLEprogramming language we are using for thisproject) and runs that input through our scan-ner. We then took any tokens that were rec-ognized by the scanner and printed them tothe system output, as well as the particularvalue of that token. For example, our outputfor a float literal token if the value was ‘19.008’would look like Figure 1.

Figure 1: Sample output

While this was part of the first deliverable re-

2

quirement, this output formatting also provedadvantageous as it made debugging our drivereasier.

Perhaps one of the more difficult aspectsof creating a scanner was generating the gram-mar file. While it was not necessarily diffi-cult to generate the regular expressions whichrepresented the tokens, the regular expressionformatting that we used in class differed fromthe formatting which is accepted by ANTLR.Some of the operations that were discussed inthe lecture slides were not supported or pro-duced warnings in the ANTLR grammar file,in particular using ranges proved problematic.In our review of regular expression it was un-derstood that [a-z] allowed for any lowercasecharacter from a to z. While [A-Z] allowedfor any uppercase character from A to Z, wewere also able to combine ranges to includeall characters from a-z both upper and lower-case, this would look something like: [a-zA-Z].However when we took that regular expressionto our grammar file we encountered an errormessage stating that [a-z] was not a recognizedcommand. After some research in the ANTLRdocumentation [2] we determined that the cor-rect way to include a range of characters waswith the [A-z] command.

While writing the grammar file proved tobe more involved than just adding regular ex-pressions line by line, it took us a while torealize that ignoring whitespace was causingsome problems with our token file generation.The first couple of times that we generatedthe grammar documents we were getting backstrange output, until we read in the documen-tation that we need to provide ANTLR withsome directions regarding the spaces betweentokens. This turned out to be as simple asadding a regular expression to our grammarfile that matched a white space and then skipover it.

After working through this and some othersmall issues we were able to generate our gram-mar file, which in-turn allowed ANTLR to cre-ate the numerous Java files that are used inour driver. While writing the driver for ourscanner it took us some time to figure out the

interactions with the many files that ANTLRproduced after executing our grammar file. Wespent some time looking through the ANTLRdocumentation and were able to get our diverto properly interact with the tokens file. Wewere able to utilize some of the packages thatcan be imported into Java to generate charac-ter streams and token streams from the inputfile. As we read through the input file we com-pared each token with the regular expressionsthat we defined. Even though we were able todetermine what tokens we had, we still neededa way to access the data for that particular to-ken. In order to achieve this we used a hashmap type of data structure. When we finallygot to the point where we could print out val-ues, we navigated through this structure to ac-cess both the token type and the value of thatparticular token. Formatting the output cor-rectly was not that hard, the first time we ranthe program we had an extra line break presentso the program did not see it as correct. Thiswas the easiest fix in all of the designing andprogramming we did for the scanner.

After generating regular expressions to rep-resent our tokens, working through issues withgrammar file generation, and debugging ourour driver, we were able to write a scannerthat successfully read an input file and gen-erated tokens by matching our regular expres-sions found in the grammar file.

3.2 Parser

The second stage in the compiling processis parsing. A parser has two main functions;the first is to determine if an input string isvalid, and the second is to determine the struc-ture of a program. To complete these functionsa parser takes as input some file (in our case itis a program written in Little) and while read-ing through it compares the input to a set ofrules given in the grammar file. When check-ing the validity of an input string, the parseractually determines if the string is grammati-cally correct. In order to determine if an inputstring is grammatically correct, we first need away to represent the correct grammar for our

3

LITTLE language.To represent a languages grammar we use

a combination of regular expressions (whichare used to scan the input) and context freegrammars. A context free grammar is a setof recursive rewriting rules (or productions)used to generate patterns of strings.[3] Con-text free grammars all consist of the same fourparts; a set of non-terminal symbols(Vn), a setof terminal symbols(Vt), a set of all possibleproductions(P ), and a start symbol(S). Notethat a production is a rule which takes the formof a non-terminal and a list of terminals andnon-terminals connected with an arrow (seebelow for an example). We generally write thisusing set notation: G = (Vt, Vn, S, P ). Usuallycontext free grammars will take on the follow-ing form in Figure 2 where S is the start sym-bol, {A,B} is the set of non-terminal symbols,and {a, b} is the set of terminals.

Figure 2: Sample production

It is important to note that the empty stringsymbol(λ) is not considered a non-terminal ora terminal. While this is a very simple exam-ple, the specific grammar that we used for theLittle language consists of 41 non-terminals,40 terminals, and 71 productions. Althoughwe were given the basic context free grammarspecification for this language, we still neededto modify the code so that it can be used inthe ANTLR environment.

Now that we are able to define our languagewe still need a way to determine if some inputcomplies with the rules, or grammar, that wespecified. To parse our Little program files weagain use ANTLR, which provides us with thesame information we used in scanning as wellas some additional parsing files. When we ex-

ecute the grammar file, ANTLR generates theLittleLexer.java, Little.tokens, and LittleList-ner.java files we had in the scanning step,while also providing us a LittleParser.java file.The parser file that is generated by ANTLR isquite large at approximately 2400 lines of codeand includes numerous methods and variables.However it is relatively simple to use, we justneed to build up a parser in our driver andthen call a method which executes the parsingstep. After we executed the parser file we wereable to generate output based on queries whichsatisfy the requirements of this step.

Building the parser in our driver file con-sists of 4 steps (See Appendix B for our Javacode):

1. The first step in building up the parseris taking the input file and converting itto a character stream. This allows us tolook at each element in the file and com-pare it to the specifications in our gram-mar file.

2. The second step in building the parserinvolves the instantiation of a lexer us-ing the character stream we generated inthe first step. This lexer is used to verifythat the language consists of correct to-kens based on the regular expressions inour grammar file.

3. The third step to building a parser isto gather the tokens recognized by thelexer. This is achieved using a commontoken stream with the lexer we createdin step 2 passed as input.

4. The fourth and final step in building aparser is to actually create an instance ofthe LittleParser.java file that was auto-matically generated by ANTLR. Instan-tiating a parser requires a token streamas input, so we passed the same tokenstream that we generated in previousstep.

Now that we have instantiated a parser inour driver, we are able to call an executionmethod, in this instance its program(), and

4

use it to drive some output. Since this stepof our project required that we simply outputthe final status of the program (accepted ornot accepted), we needed a way to determineif any errors occurred while parsing the file.While there may be some more elegant or effi-cient way to check for errors, we opted to uti-lize the getNumberOfSyntaxErrors() methodprovided in the parser class. This method sim-ply returns an integer value corresponding tothe number of errors found during parsing.Therefore, if the file was successfully parsedthe value returned would be 0. Knowing this,it was simple enough to capture the result ofthat method call in an if/else statement. SeeAppendix B for our Java implementation ofthis conditional.

While it proved simple enough to deter-mine if a file was successfully parsed, we ranin to ANTLR’s default error recovery; that isto print parsing the errors directly to the ter-minal. This may be a good way to handleerrors in actual compilation but it was prob-lematic for this step of the project since weonly wanted to print “accepted” or “not ac-cepted”. Our program still printed the correctresult of the parsing, however, the additionaloutput was recorded as wrong since the diff

command is used to grade in the Oxiago Com-piler Factory.

This is one aspect of implementing a parserwhere we struggled. We spent a lot of timereading through the LittleParser.java file try-ing to figure out where the default error recov-ery was taking place. In so doing, we learnedthat removing the error recovery stages fromeach method in the parser class caused our pro-gram to crash every time it encountered inputthat does not conform to our specifications.In reading the ANTLR documentation we re-alized that some of the error recovery takesplace in the LittleLexer.java file that was gen-erated in the steps where we instantiated ourparser. We spent some time researching waysto mute the error listeners in our parser, andwe found the solution on stack overflow.[4] Inorder to prevent the error messages from print-ing to the terminal we had to remove an error

listener from the instance of our parser. Thiswas done using a simple method call on ourparser instance. See Appendix B for the spe-cific method call we used to remove the errorlistener.

Another small issue we encountered was inthe generation of our grammar file. Specifi-cally, converting the context free grammar wewere given, into something that the ANTLRprogram could understand. We again had toreference the ANTLR documentation to figureout how to properly format context free gram-mars in an ANTLR grammar specification file.It was a simple matter of converting some ar-rows (→) to colons (:). We also realized thatin copying the provided context free grammarto our grammar specification file, we failed tocopy one production of an expression. Lackingthis production caused our grammar file to failduring execution. Once we realized the errorand added the production back into our file,everything started working again.

Only after we were able to successfully gen-erate a grammar specification file and resolvethe defaulted error recovery actions in ourparser were we able to take a LITTLE programfile as input and successfully parse it, providingthe results of that parsing to the terminal.

3.3 Symbol Table

The third stage in the compiling processis the construction of a symbol table. A sym-bol table is an important data structure “cre-ated and maintained by compilers in order tostore information about the occurrence of var-ious entities, such as variable names, functionnames, objects, classes, and interfaces.”[5] Asymbol table is associated with each scope ofa program (where it stores the declared at-tributes as a name and type). The scope beingthe binding of the program where particularvariables are valid. Some of the scopes foundin our LITTLE language include global (existsthrough the entire program), loop blocks (ex-ists only in that loop), and if statements (existsonly in the clause). While symbol table gen-eration is considered a step in the compilation

5

process, there are no execution actions takenin this step. We are actually creating a toolthat will help us in the semantic action andcode generation process.

It is important to note that managing thenumerous symbol tables you may have in aprogram can be difficult (one for each scope).Typically a list of all tables is kept using a stackto maintain a pointer to the most current ta-ble. Following this design, it makes accessingdifferent tables and adding to the stack as sim-ple as some push and pop method calls.

In order to implement a symbol table, weneed a way to step through the parsed pro-gram file, a way to store the data we find,and a way to manage all our tables. Becausewe are using ANTLR, we have access to anauto generated LittleVisitor.java and a Lit-tleListener.java class which provide a way tomove through each part of the LITTLE pro-gram. We opted to use the Listener class be-cause there seemed to be more resources avail-able to assist with trouble shooting. In addi-tion, our ANTLR program does not automat-ically generate a Visitor class by default, wewould have had to change some configurationsetting in order to get the LittleVisitor.javaclass. We decided to use an ArrayList as ourdata structure to maintain a list of our symboltables. Using the ArrayList provided us withsome built in Java operations, requiring less“back end’ implementation (as we felt this wasoutside the scope of this project). Our indi-vidual symbol tables were stored using a HashMap containing two strings corresponding tothe name of the artifact and its data type.

After we decided what tools and structureswe were going to use to implement the symboltables, it was time to write some code. UsingANTLR we created a ParseTreeWalker whichprovided a way to move through the programand call methods in the Listener class. Thiswas as simple as passing the parser we gen-erated in the previous step as input to ourSymbolTableBuilder, an extension of the Lis-tener class. Inside the Listener class we areprovided numerous methods that are called onentrance to and exit from each scope in our

LITTLE language. All we needed to do was de-termine which methods we needed to use andoverride them with our specific implementa-tion. After reading the problem statement, weconcluded that we needed to modify 13 of the88 methods provided. The methods we modi-fied are:

• enterPgm Body()

• exitPgm Body()

• exitString decl()

• exitVar decl()

• enterParam decl()

• enterFunc decl()

• exitFunc decl()

• enterIf stmt()

• exitIf stmt()

• enterElse part()

• exitElse part()

• enterWhile stmt()

• exitWhile stmt()

Implementing the above methods proved fairlysimple once we understood how they werecalled by the ParseTreeWalker. All we neededto do was determine what information we hadon the entrance to a scope, as well as, whenwe left the scope. Each function was passed acontext which came from our parser, giving usaccess to the names and types of variables alsocalled artifacts in each scope. For an imple-mentation of one of the above methods, pleasesee Appendix C.

The problem statement we were given in-cluded some suggestions and steps we shouldtake. One of those suggestions was that weuse a method call to print out our symbol ta-bles after walking through the entire program.While this would not have been too difficult toimplement, we found it much easier to printour tables as we were adding elements. Oneargument for this approach is that it requiredless traversals of our symbol table list, which

6

could possibly improve the time complexity ofour compiler.

Implementing the symbol table for ourLITTLE language proved more challengingthan we had anticipated. There were manyaspects of this implementation that allowed usto decide which tool to use. For example, wecould have used the Listener or Visitor class,we could have used any number of data struc-tures to store our tables, and we could havehandled the printing of our tables in a coupledifferent ways. With all of this in mind, therewere a few aspects that we struggled with orchoices we made that proved problematic.

Perhaps the most challenging aspect of thesymbol table was figuring out how to save thedata as we moved through the different scopesof the program; after all there are many dif-ferent ways to store the data. One consid-eration that went into our final decision wasthe fact that we were, in essence, storing twopieces of data, the name of a variable and itstype. The way we handled this was to usea Hash Map, holding two fields (we opted tostore both fields as strings to make printingeasier). This Hash Map was then insertedinto an Array List which represented the stackpointer to the most recent symbol table or cur-rent scope. Even though we knew how wewere going to store the data, we found mov-ing through the program and figuring out howto access the variables and their types from theListener class was, in-itself, a challenge. Aftersome reading in the ANTLR documentationand looking at some examples of Listener classimplementations, we figured out how to inter-act with the methods in the Listener class. Itactually turned out to be easier than we weretrying to make it.

Another aspect we struggled with was errorhandling. This part of the project required usto print only a message “DECLERATION ER-ROR”. Since we first tried to print the symboltable as we generated it we were finding the er-rors, but only after we had printed everythingup to that point. To correct this we createda duplicate SymbolTableBuilder class whichdid not include any print statements. Then

we ran through our program with out printingand checked for any errors. If we found an errorour conditional would only print the error mes-sage, if no error was found we would call theSymbolTableBuilder class with the printingmethod. Since we were close to the deadlinewe left well enough alone. However since thedeadline we were able to implement a separateprint function rather than our temporary fix.

Once we determined the steps we needed totake, the tools we were going to use, and fixedthe problems we encountered, we had a work-ing symbol table builder. Using the scannedand parsed LITTLE program file, we success-fully identified the scopes of the program, thevariables present in each scope, and built asymbol table which correctly corresponds tothe input program.

3.4 Code Generation

At this point in the compiling process,we have completed all the necessary pre-processing needed to assign some actions tothe code, which is what our final goal is...to dosomething. In order to get the LITTLE pro-gram to do something, we need to generate in-structions the computer can understand. Thisis achieved through semantic actions, whichare routines called as productions, or parts ofproductions, are recognized. Semantic actionsare usually found in the productions of a gram-mar and work together to build up an instruc-tion. There can be numerous actions found ina single production. Semantic actions typicallytake the form found in Figure 3

Figure 3: If-Else semantic routines

The above production is an example of anif-else statement, where #startif is the se-mantic routine. With the help of semantic ac-tions and the routines associated with them,we are able to generate an intermediate in-struction set, in this case we will use 3-AddressCode or 3AC.

7

Once we have the intermediate instructionscreated and stored in our program, we can con-vert those instructions into an Instruction SetArchitecture or ISA, for the appropriate oper-ating system we are running. Since we are run-ning our compiler on the OCF framework, wewill be using a toy assembly language TINY toexecute our code. (TINY creates executablesthat run on the Sun architecture).

Before we began coding this step, we tooksome time to determine what we would needand how each part of our program would in-teract. We decided that our implementationwould use an IR construction class to buildand store the intermediate set of instructionsthat we generate. In our implementation theinstructions are generated as we walk throughthe LITTLE program, using the same steps asthe previous part of this step. Because we didnot initially get our previous step working asbest it should have been, we had to spend sometime generating the symbol table. In so doing,we created a class that represents a symbol, aswell as, a class that represents a symbol table.

Our IR builder class generates the symboltable as the listener moves through the LIT-TLE program adding symbols to symbol ta-bles, where they are stored in hash maps. Wechose to use hash maps as they allow us tostore more data in a more efficient manner. Be-cause of our design choices we were able to keepthe symbol table generation inside the Listenerclass, reducing the time complexity of the walkoperation.

Once we had the symbol tables built, wehad our intermediate instruction set, all wehad to do was convert that intermediate rep-resentation into the TINY code that the OCFframework could execute. This involved theuse of a TINY builder class which movedthrough the stack of instructions we createdin the IR builder. For each instruction itencountered we assigned a TINY instruction,as well as, the appropriate operands, this iswhere we ran into some problems. We hada hard time determining which TINY instruc-tions we needed to assign for each routine thatwe stored in our instruction stack. For exam-

ple, if we encountered an if-else statementin the LITTLE code our intermediate instruc-tions did not directly convert to TINY code.We had to write a few methods that would gen-erate the correct order of instructions becausean if-else statement in LITTLE is less com-plicated than an if-else statement in TINY.Some of the things we had to account for are:

• We needed to save each of the valuesin the conditional to registers so thatwe could use the comparison instructionprovided in the TINY ISA.

• We needed to create labels for the if

part of the conditional as well as the elsepart, if there was none we needed a wayto branch to the next instruction.

• We needed to perform the correct opera-tions inside of each block in the if-else

statement and then branch to the nextinstruction.

While the above consideration may be specificto an if-else statement, the same types ofconsiderations and decisions need to happenfor each type of instruction block in a program.Figure 4 shows the conversion of a LITTLEif-else statement into TINY code.

Figure 4: Converting LITTLE to TINY

Another problem that we ran into while im-plementing the code generation of our compilerwas the use of deprecated operations in theANTLR API. The first few times we tried to

8

run the code we had written, we kept gettingcompile errors asking us to check the console,however the console was blank. Since we foundsome of the operations we were using in theANTLR documentation, that was not the firstthing we checked. After struggling for a whileand contacting the OCF developers, we wereable to determine that some of the ANTLRfunctionality was no longer supported in theway we were using it. While it was frustrating,it provided us the opportunity to work aroundour problems in another way, we learned somenew things about ANTLR. It also taught ussomething about debugging, sometimes its notalways your code that’s wrong.

Perhaps the most complicated part of thisstep was generating the intermediate instruc-tions for the control structures we encounteredin LITTLE such as for loops, while loops,and if-else statements. We spent the major-ity of our development time on this aspect ofcode generation, however, once we got it fig-ured out converting the intermediate instruc-tions to TINY was not hard at all. What was alittle misleading was the difference in 3AC in-structions that we generated compared to whatwere expected. This caused us to spend sometime reworking our code when we might nothave had to.

Once we were able to successfully buildour symbol tables, we could generate the in-termediate instructions for the LITTLE code.Then, after converting to TINY, we were ableto properly execute the instructions generatedand obtained the desired results for all the testcases.

3.5 Full-fledged Compiler

Creating a compiler which successfullycompiles a LITTLE program was not easy andincluded numerous steps, perhaps even morethan the four main steps we discussed. Whileusing ANTLR aided our development, it def-initely proved challenging and intimidating inits own way. ANTLR provides numerous toolsand operations that we did not use, and evenmore that we were not aware of, but once we

figured out how to use it for our program, itbecame a huge help.

Before we began coding the last part of thisproject, we took some time to look at the prob-lem, what tools and information we had, andfigured out the best way to create compiler.Once we determined what classes we wouldneed to successfully implement the code gener-ation and how they would interact, we gener-ated a Unified Modeling Language, UML, classdiagram of our complete system. The purposeof this model is to show how we plan to cre-ate components and interface them together.It serves as a plan for how we want to imple-ment the system. Figure 5 is the simple modelwe developed for our system. (We did not in-clude attributes or operations as we were notsure what we were going to use at that stageof the development process) It’s important tonote that our final program is a little differentthan the model we developed, as we encoun-tered some problems that we did not considerin the planning stage.

Figure 5: UML diagram of our compiler

When designing our compiler, we did notnecessarily use any design patterns, part of thereason being that we were not creating numer-ous objects. Rather we were using the objectsand interfaces that ANTLR generated for us.One could loosely say that the factory pat-tern was used and call ANTLR the factory.That being said, we did consider things likespace complexity, time complexity, and main-

9

tainability as we worked toward good program-ming practices.

On the idea of maintainability, we designedour driver to make only one method call, thewalk() method. The driver should never haveto be updated, and you can easily trace howthings are executed by following the call. OurListener is responsible for generating the syn-tax tables, intermediate instructions, and facil-itating the TINY code generation. This givesthe Listener the power to use the classes andmethods that it needs. We determined thatthis would allow for better time complexity.Since we are only walking through the LIT-TLE code one time, we have much better timecomplexity than a program that uses multiplefunction calls on the LITTLE program fromthe driver.

Some of the important design decisions wemade include:

• Deciding to use the Listener class overthe Visitor class provided by ANTLR.Because we used the visitor class we didnot need to change any ANTLR settingsto generate a Visitor class.

• Deciding not to use stacks or lists in ourfinal symbol table generation design. Werealized that maintaining indices and ac-cessing the information would be morecomplicated than we needed, especiallyas the size of the lists or stacks grew.

• Deciding to use a tree over a table for ourSymbol Table. This allowed us to easilydebug, as well as made the intermediateinstruction generation much easier.

• Deciding to use a stack for storing ourintermediate instructions. This decisionperhaps made things a little more com-plicated because order is important in in-structions, but difficult to maintain in astack.

One of the decisions we made helped withspace complexity and for storing our symboltables we used a hash map. Hash maps aremore efficient than some of the other data

structures we could have used and they allowus easy access to the data without multiple it-erations.

Through out the development of our com-piler we found ourselves using many differentcomputers. Using NetBeans as our IDE madedevelopment easier, but it created the problemof taking the project files to each new computerwe used. To handle this, and to allow for betterversion control, we put our project on GitHub.This allowed us to track the changes we weremaking along the way, as well as, giving us ac-cess to the repository from any computer weused.

Writing this paper also proved challenging,as we were using many different computers anddifferent operating systems. On top of that,writing a technical paper in Microsoft Wordcan prove to be challenging. Rather than fightthe program we opted to use Overleaf, a LATEXprocessing application. Overleaf gave us ac-cess to our report anywhere we were workingon our program and it also allows for collabora-tive writing, while maintaining a version con-trol system.

4 Conclusion/Future Work

We were able to create a compiler thatsuccessfully took a LITTLE program and gen-erated assembly instructions, which were exe-cuted using Sun Architecture. While we con-sider this program a success, there is alwaysroom for improvement and more efficient de-sign. Were we to have more time and resourcesthere are some things that we would like tochange and implement.

One of the changes we would like to do isto make our compiler more efficient in regardsto both space and time complexity. There aresurely more efficient approaches to construct asymbol table and Abstract Syntax Tree, whichwould make code generation more efficient.This is something that would require some re-search and lots of trials, but its something thatcan only improve our current design.

Something that we would like to try to im-

10

plement is instruction scheduling. While theprocessors on many computers are quite ef-ficient at scheduling instructions, we believethat the way to be most efficient is to createyour assembly code the “smart” way. That is,to consider scheduling when you generate yourcode. This is something that could greatly in-crease the efficiency of the program running ona computer.

Another aspect of a compiler that we didnot implement, but would be interesting to do,is code optimization. The code we generatedis by no means the most efficient code for theprograms we were executing. It would be chal-lenging, but interesting, to see how efficient wecould get the TINY code we pass to the com-puter for execution.

In conclusion, this project forced us to lookat a different type of programming. It requiredus to fully understand the problems at eachstep, and to design solutions that can be usedlater on in development. Perhaps one of thebiggest take-aways from this project is that youcan accomplish big goals and huge programs ifyou break them down and solve one part ata time. We were able to successfully design asolution each step of the way and because ofthat, we have a working compiler for the LIT-TLE language.

References

[1] Randal C. Nelson, “Regular Expressions.”Slides from CSC 173, University ofRochester.

[2] Terence Parr, “ANTLR 4 Documen-tation.” https://github.com/antlr/

antlr4/blob/master/doc/index.md.

[3] Randal C. Nelson, “Context-Free Gram-mars.” Slides from CSC 173, University ofRochester.

[4] uwolfer, “ANTLR 4: Avoid error printingto console.” https://stackoverflow.

com/questions/25990158/

antlr-4-avoid-error-printing-to-console,February 2016.

[5] Tutorials Point, “Compiler Design- Symbol Table.” https://www.

tutorialspoint.com/compiler_design/

compiler_design_symbol_table.htm.

Appendix A

From the project specification document wewere able to derive the following regular ex-pressions that represent tokens for identifiers,integer literals, float literals, string literals,comments, keywords, operators, and whitespaces.

• Comment: any string following a doubledash that ends at the new line.COMMENT: ‘--’(.*?)‘\n’->skip;

• Identifier: a letter followed by any num-ber of letters and number, identifiers arecase sensitive.IDENTIFIER: [A-z]([0-9][A-z])*;

• Float Literals: floating point numbersavailable with values after the decimaland optional values before the decimal.FLOATLITERAL: [0-9]*‘.’[0-9]+;

• Integer Literal: any integer numberINTLITERAL: [0-9]+;

• String Literal: any sequence of charac-ters except ‘”’ between ‘”’ and ‘”’.STRINGLITERAL: ‘"’(.*?)‘"’;

• White Space: any space between any twotokens (this will be thrown away).WHITESPACE: [\t\r\n]+->skip;

• Keywords: any of the given words. Wejust used the or operator ‘|’ and com-bined all the given keywords in a regularexpression.

• Operators: any of the given operators.We just used the or operator ‘|’ and com-bined all the given operations in a regularexpression.

11

https://github.com/antlr/antlr4/blob/master/doc/index.md

https://github.com/antlr/antlr4/blob/master/doc/index.md

https://stackoverflow.com/questions/25990158/antlr-4-avoid-error-printing-to-console



https://www.tutorialspoint.com/compiler_design/compiler_design_symbol_table.htm



Appendix B

Below are some segments of code from ourdriver that were specifically used in the gen-eration and implementation of a scanner.

• Here is a snippet of code from our driverused to instantiate a parser through thefour steps that were outlined above.

CharStream str = CharStreams.

fromFileName(fn);

LittleLexer lex = new

LittleLexer(str);

CommonTokenStream tok = new

CommonTokenStream(lex);

LittleParser parser = new

LittleParser(tok);

parser.program ();

• Here is a snippet of code from our drivershowing the comparison we used to de-termine if any errors had occurred in theparsing process. This code also providesthe output to the terminal.

if(parser.

getNumberOfSyntaxErrors ()

== 0){

System.out.println("

Accepted");

} else {

System.out.println("

Not Accepted");

}

• Here is a snippet of code from our driverclass that shows how we removed the er-ror listener from our parser to stop theerrors from printing to the terminal.

parser.removeErrorListener(

ConsoleErrorListener.

INSTANCE);

The intent of the above code is not to give youa comprehensive look at our driver, rather itis included to provide you with our approachto solving specific aspects of our parser imple-mentation.

Appendix C

Below is our implementation of theexitVar decl(), which is capable of handlingsingle variable decelerations as well as multiplevariable decelerations.@Override

public void exitVar_decl(LittleParser.Var_declContext ctx

){

HashMap <String , String > currScope = peek();

String id = ctx.getChild (1).getText ();

if (currScope.containsKey(id)) { // checking for

duplicate

error = true;

errors.add(id);

System.out.println (" DECLARATION ERROR " +

id);

} else {

String type = ctx.getChild (0).getText ();

String [] mulVar = id.split (",");

if (mulVar.length >= 1) { // multiple

variable declaration

for (int i = 0; i < mulVar.length

; i++) {

currScope.put(mulVar[i],

type);

System.out.println ("name

" + mulVar[i] + "

type " + type);

}

return;

} else { // only a single variable

declaration

currScope.put(id, type);

System.out.println ("name " + id +

" type " + type);

}

}

}

The intent of the above code is not togive you a comprehensive look at ourSymbolTableBuilder class, rather its to showone example of how we approached the con-struction of a symbol table. The steps we usedin this method are very similar to those weused in other methods, however this is the onlymethod that handles multiple variable deceler-ations.

12

Section 5: UML

Since we were not truly sure how this project was going to look near the beginning of the course we didnot generate a UML diagram right away. Another reason we did not immediately develop a UML diagramis the fact that the first 2 steps of the project required no design considerations and left little room formodification. However once we completed the third part and before we began the fourth part we tooksome time to consider how the project would look. That is, we created a UML diagram representing theinteractions between our classes and interfaces. Figure 1 is the model we developed.

Figure 1: UML diagram of our compiler

Section 6: Design Trade-offs

To help reduce time complexity and increase the space efficiency of our program we decided to useHash Maps to store the symbols inside of our symbol tables. Doing so allowed us to keep the majority ofthe symbol table generation inside of our Listener class. One major setback we encountered because of thischoice was in the generation of the intermediate instructions. Because of our storage choice the code togenerate these instructions became too complex to leave inside the Listener class, so we had to create the IRbuilder class. If we would not have done that our code would have been too complex to read and understand,let alone debug.

Although because of this we ended up with redundant code as some aspects of the IR builder dependedon parts of the Listener. To account for this we ended up passing around responsibilities and the two classesended up working in sync. Basically the Listener called some functions inside the IR builder and then waitedfor something to be returned. This allowed us to continue to use the design we chose earlier on with littlework around.

20

We first tried to do step 3 in a different manner, however we ended up having to make some changesto just get it printing (to receive credit for that part). After looking into step 4 we realized that some ofthe decisions we had made in step 3 were going to make things very complicated. Rather than try to fixthem and patch some issues we decided to rework our solution to part 3 using a tree style structure. Thisallowed us to easily move into the fourth step and complete the parser. In so doing we realized that thedesign choices you make at any point along the way can have a huge impact on the outcome of the systemyou are trying to design.

Section 7: Software Development Life Cycle Model

For this project we used the Agile development approach. In so doing we decided who would beresponsible for ensuring that each step in the project was started and completed when necessary. Howeveras we moved through the development process those roles and duties changed as we determined who wasbetter fitted for each part of the project.

Though each member of the team was given an assignment for a particular part of the project we allhelped work on most parts. All three team members were involved in proof reading and code developmentfor each step of the project. About once a week we shared an email with the group pertaining to where westood in respect to each steps requirements and deliverables. Because OCF is online we were able to watchthe progress of each step while we were not together as a team.

While teamwork and collaboration are necessary for a project of this scale, it is true that the majorityof our advancements came when we were each working on our parts alone. For each step we got most of thework done together and during lab times but the final breakthroughs and completions came while we wereat home working. This is something that is important to consider when planning a large project, havingtime to work independently but also the time to come together as a team and see what needs to be done.

21

Compilers - Montana State University

Documents

Transcript of Compilers - Montana State University