编译过程

本章节主要介绍了一个较为简单的编译全过程,介绍了Context-free Grammars (CFG) 与编译器之间的关系,在语言的识别和转化中利用了较多的解析树进行解析,进一步定义词法分析以及它和普通的表达,语法之间的关系和区别。
展开查看详情

1.Chapter 2: A Simple One Pass Compiler Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

2.The Entire Compilation Process Grammars for Syntax Definition Syntax-Directed Translation Parsing - Top Down & Predictive Pulling Together the Pieces The Lexical Analysis Process Symbol Table Considerations A Brief Look at Code Generation Concluding Remarks/Looking Ahead

3.Grammars for Syntax Definition A Context-free Grammar ( CFG ) Is Utilized to Describe the Syntactic Structure of a Language A CFG Is Characterized By: 1. A Set of Tokens or Terminal Symbols 2. A Set of Non-terminals 3. A Set of Production Rules Each Rule Has the Form NT  {T, NT}* 4. A Non-terminal Designated As the Start Symbol

4.Grammars for Syntax Definition Example CFG list  list + digit list  list - digit list  digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (the “|” means OR) (So we could have written list  list + digit | list - digit | digit )

5.Grammars are Used to Derive Strings: Using the CFG defined on the previous slide, we can derive the string: 9 - 5 + 2 as follows: list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit  9 - 5 + 2 P1 : list  list + digit P2 : list  list - digit P3 : list  digit P4 : digit  9 P4 : digit  5 P4 : digit  2

6.Grammars are Used to Derive Strings: This derivation could also be represented via a Parse Tree (parents on left, children on right) list digit digit list digit list 9 5 2 - + list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit  9 - 5 + 2

7.A More Complex Grammar What is this grammar for ? What does “  ” represent ? What kind of production rule is this ? block  begin opt_stmts end opt_stmts  stmt_list |  stmt_list  stmt_list ; stmt | stmt

8.Defining a Parse Tree More Formally, a Parse Tree for a CFG Has the Following Properties: Root Is Labeled With the Start Symbol Leaf Node Is a Token or  Interior Node (Now Leaf) Is a Non-Terminal If A  x1x2…xn, Then A Is an Interior; x1x2…xn Are Children of A and May Be Non - Terminals or Tokens

9.Other Important Concepts Ambiguity string string string string string + 2 - 5 9 Why is this a Problem ? Grammar: string  string + string | string – string | 0 | 1 | …| 9 Two derivations (Parse Trees) for the same token string. string string string string string - 9 + 5 2

10.Other Important Concepts Associativity of Operators Left vs. Right right letter letter right letter right c b a - + right  letter + right | letter – right | letter letter  a | b | c | …| z list digit digit list digit list 9 5 2 - +

11.Other Important Concepts Operator Precedence What does 9 + 5 * 2 mean? Typically ( ) * / + - is precedence order This can be incorporated into a grammar via rules: expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor  digit | ( expr ) digit  0 | 1 | 2 | 3 | … | 9 Precedemce Achieved by: expr & term for each precedence level Rules for each are left recursive or associate to the left

12.Syntax-Directed Translation Associate Attributes With Grammar Rules & Constructs and Translate As Parsing Occurs Our Example Uses Infix to Postfix Notation Translation for Expressions Translation May Be Defined Inductively As: Postfix( e ) , E is an Expression 1. If E is a variable | constant  Postfix( E ) = E 2. If E is E1 op E2  Postfix( E ) = Postfix( E1 op E2 ) = Postfix( E1 ) Postfix( E2 ) op 3. If E is (E1)  Postfix( E ) = Postfix( E1 ) Examples: ( 9 – 5 ) + 2  9 5 – 2 + 9 – ( 5 + 2 )  9 5 2 + -

13.Syntax-Directed Definition: (2 parts ) Each Production Has a Set of Semantic Rules Each Grammar Symbol Has a Set of Attributes For the Following Example, String Attribute “ t ” is Associated With Each Grammar Symbol, i.e., What is a Derivation for 9 + 5 - 2? expr  expr – term | expr + term | term term  0 | 1 | 2 | 3 | … | 9

14.Derivation for 9 + 5 - 2 expr  expr + ter m  expr – term + term  term – term + term  9 – term + term  9 – 5 + term  9 – 5 + 2 expr  expr – term | expr + term | term term  0 | 1 | 2 | 3 | … | 9 Derivation Using Rule expr  expr + ter m expr  expr + term expr  term term  9 term  5 term  2

15.Syntax-Directed Definition: (2 parts ) Each Production Rule of the CFG Has a Semantic Rule Note : Semantic Rules for expr Use Synthesized Attributes Which Obtain Their Values From Other Rules. Production Semantic Rule expr  expr + term expr.t := expr.t || term.t || ‘+’ expr  expr – term expr.t := expr.t || term.t || ’-’ expr  term expr.t := term.t term  0 term.t := ‘ 0 ’ term  1 term.t := ‘ 1 ’ …. …. term  9 term.t := ‘ 9 ’

16.Semantic Rules are Embedded in Parse Tree expr.t = 95- expr.t = 9 expr.t = 95-2+ term.t = 5 term.t = 2 term.t = 9 2 + 5 - 9 How Do Semantic Rules Work ? What Type of Tree Traversal is Being Performed? How Can We More Closely Associate Semantic Rules With Production Rules ?

17.Examples rest  + term rest  rest  + term { print (‘+’)} rest (Print ‘+’ After term for postfix translation) expr  expr + term { print (‘+’)}  expr - term { print (‘-’)}  term term  0 { print (‘0’)} term  1 { print (‘1’)} … term  9 { print (‘9’)} term term term expr expr expr 9 5 2 - + { print (‘-’)} { print (‘9’)} { print (‘5’)} { print (‘2’)} { print (‘+’)}

18.Parsing – Top-Down & Predictive Top-Down Parsing  Parse tree / derivation of a token string occurs in a top down fashion. For Example, Consider: type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num Suppose input is : array [ num dotdot num ] of integer The parse would begin with the rule type  array [ simple ] of type

19.Top-Down Parse (type = start symbol) type ] simple of [ array type type ] simple of [ array type num num dotdot Input : array [ num dotdot num ] of integer Tokens

20.Top-Down Parse (type = start symbol) Input : array [ num dotdot num ] of integer type ] simple of [ array type num num dotdot simple type ] simple of [ array type num num dotdot simple integer

21.Top-Down Process Recursive Descent or Predictive Parsing Parser Operates by Attempting to Match Tokens in the Input Stream Utilize both Grammar and Input Below to Motivate Code for Algorithm array [ num dotdot num ] of integer type  simple |  id | array [ simple ] of type simple  integer | char | num dotdot num procedure match ( t : token ) ; begin if lookahead = t then lookahead : = nexttoken else error end ;

22.Top-Down Algorithm (Continued) procedure type ; begin if lookahead is in { integer, char, num } then simple else if lookahead = ‘ ’ then begin match ( ‘ ’ ) ; match ( id ) end else if lookahead = array then begin match ( array ); match (‘[‘); simple; match (‘]’); match ( of ); type end else error end ; procedure simple ; begin if lookahead = integer then match ( integer ); else if lookahead = char then match ( char ); else if lookahead = num then begin match ( num ); match ( dotdot ); match ( num ) end else error end ;

23.Left Recursion in CFG May Cause Parser to Loop Forever Left – made right choice Right – top down parsing could keep making wrong choice Problem with Top Down Parsing expr  expr + term | expr - term | term term  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr  expr + ter m  expr – term + term  term – term + term  9 – term + term  9 – 5 + term  9 – 5 + 2 expr  expr + ter m  expr – term + term  expr – term – term + term  expr – expr – term – term + term  etc.

24.Problem with Top Down Parsing expr  expr + term | expr - term | term term  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr  term rest rest  + term rest | - term rest |  term  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 * New Semantic Actions ! rest  + term { print (‘+’)} rest | - term { print (‘-’)} rest |  Solution : Algorithm to Remove Left Recursion

25.Comparing Grammars with Left Recursion Notice Location of Semantic Actions in Tree What is Order of Processing? expr expr expr term term term { print (‘2’)} { print (‘+’)} { print (‘5’)} { print (‘-’)} { print (‘9’)} 5 + 2 - 9

26.Comparing Grammars without Left Recursion Now, Notice Location of Semantic Actions in Tree for Revised Grammar What is Order of Processing in this Case? { print (‘2’)} expr term term { print (‘-’)} term { print (‘+’)} { print (‘5’)} { print (‘9’)} rest rest 2 5 - 9 +  rest

27.The Lexical Analysis Process A Graphical Depiction uses getchar ( ) to read character pushes back c using ungetc (c , stdin) returns token to caller tokenval Sets global variable to attribute value lexan ( ) lexical analyzer

28.The Lexical Analysis Process Functional Responsibilities Input Token String Is Broken Down White Space and Comments Are Filtered Out Individual Tokens With Associated Values Are Identified Symbol Table Is Initialized and Entries Are Constructed for Each “Appropriate” Token Under What Conditions will a Character be Pushed Back? Can You Cite Some Examples in Programming Language Statements?

29.Algorithm for Lexical Analyzer function lexan : integer ; var lexbuf : array [ 0 .. 100 ] of char ; c : char ; begin loop begin read a character into c ; if c is a blank or a tab then do nothing else if c is a newline then lineno : = lineno + 1 else if c is a digit then begin set tokenval to the value of this and following digits ; return NUM end