计算机语言编译器

本章节主要介绍什么是计算机语言编译器,计算机语言编译器的构成,两种主要的工具Lex/Flex 和Yacc/Bison,利用计算机语言编译器编译语言的过程,介绍了目前编译器的主要分类,以及在编译语言的过程中需要注意的语言的词法,语法以及层次的分析与构成。
展开查看详情

1.Chapter 1: Introduction to Compiling Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

2.Introduction to Compilers As a Discipline, Involves Multiple CSE Areas Programming Languages and Algorithms Software Engineering & Theory / Foundations Computer Architecture & Operating Systems But, Has Surprisingly Simplistic Intent: Compiler Source program Target Program Error messages Diverse & Varied

3.What’s in a Compiler? Source Program Lexical Analyzer 1 Syntax Analyzer 2 Semantic Analyzer 3 Intermediate Code Generator 4 Code Optimizer 5 Code Generator 6 Target Program Symbol-table Manager Error Handler 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis

4.Two Main Compiler Writing Tools Lex and Flex Based on Deterministic and Non-Deterministic Finite Automata Lexical Tokens and their Order of Processing Define the Words and Punctuations Yacc and Bison Based on Context Free Grammar Formally Define the Structure of a Language with Grammar Rules Parse the Sentences of a Language

5.Lex/Flex and Yacc /Bison Two Compiler Writing Tools to Specify : Lexical Tokens and Order of Processing (Lex) Context Free Grammar for LALR(1) ( Yacc ) Both Lex and Yacc have Long History in Computing Lex and Yacc – Earliest Days of Unix Minicomputers Flex and Bison – From GNU JFlex - Fast Scanner Generator for Java BYacc /J – Berkeley CUP, ANTRL, PCYACC, … PCLEX and PCYACC from Abacus See: http://dinosaur.compilertools.net /

6.Lex – A Lexical Analyzer Generator A Unix Utility from early 1970s A Compiler that Takes as Source a Specification for: Tokens/Patterns of a Language Generates a “C” Lexical Analyzer Program Pictorially: Lex Compiler C Compiler a.out Lex Source Program: lex.y lex.yy.c lex.yy.c a.out Input stream Sequence of tokens

7.What is the Compilation Process? Consider the main.c C Program #include <stdio.h> #include <ctype.h> #include <string.h> main() { char day[10]; int m, d, y; m = 9; d = 7; y = 2011; strcpy(day, "Wednesday"); printf("Today is: "); printf("%s %d/%d/%d

8.Viewing C from Compiler Theory Tokens or Patterns for the “Words” of the Programming Language Specified in Lex/Flex Grammar Rules that Describe the Allowable Sentences of the Programming Language Specified in Yacc /Bison Other Stages of Compilation Semantic and Type Checking Analysis Intermediate Code Generation Code Optimization Final (Relocatable) Code Generation

9.Viewing C from Usage Side ANSI C Specification of Lexical Structure in Flex ANSI C Specification of Grammar Rules in Bison Multi-Stage Compilation Process from Source Code to Preprocessed Code to Assembly Code to Object Code to Executable Preprocess: gcc –E main.c Generates main.e Assembly: gcc –S main.c Generates main.s Object: gcc –c main.c Generates main.o Executable: gcc main.c Generates a.out

10.What are Tokens? Smallest Individual Units that are Recognized #include < stdio.h > #include < ctype.h > #include < string.h > main() { char day[10]; int m, d, y; m = 9; d = 7; y = 2011; strcpy (day, "Wednesday"); printf ("Today is: "); printf ("%s %d/%d/%d

11.C Lexical Specification in Flex http://www.lysator.liu.se/c/ANSI-C-grammar-l.html D [0-9] L [a-zA-Z_] H [a-fA-F0-9] E [Ee][+-]?{D}+ FS (f|F|l|L) IS ( u|U|l|L )* %{ #include < stdio.h > #include " y.tab.h " void count(); %} %% "/*" { comment(); }

12.C Flex Continued … "auto" { count(); return(AUTO); } "break" { count(); return(BREAK); } "case" { count(); return(CASE); } "char" { count(); return(CHAR); } "const" { count(); return(CONST); } "continue" { count(); return(CONTINUE); } "default" { count(); return(DEFAULT); } "do" { count(); return(DO); } "double" { count(); return(DOUBLE); } "else" { count(); return(ELSE); } "enum" { count(); return(ENUM); } "extern" { count(); return(EXTERN); } "float" { count(); return(FLOAT); } "for" { count(); return(FOR); } "goto" { count(); return(GOTO); } "if" { count(); return(IF); } "int" { count(); return(INT); }

13.C Flex Continued … "long" { count(); return(LONG); } "register" { count(); return(REGISTER); } "return" { count(); return(RETURN); } "short" { count(); return(SHORT); } "signed" { count(); return(SIGNED); } " sizeof " { count(); return(SIZEOF); } "static" { count(); return(STATIC); } " struct " { count(); return(STRUCT); } "switch" { count(); return(SWITCH); } " typedef " { count(); return(TYPEDEF); } "union" { count(); return(UNION); } "unsigned" { count(); return(UNSIGNED); } "void" { count(); return(VOID); } "volatile" { count(); return(VOLATILE); } "while" { count(); return(WHILE); } What TOKEN is Missing?

14.What Does Following Define? Any Ideas? Why is it Listed Last on the Previous Slide? D [0-9] L [a-zA-Z _] { L}({L}|{D })* { count (); return( check_type ()); } int check_type () { /*pseudo code --- this is what it should check * if ( yytext == type_name ) * return(TYPE_NAME); * return(IDENTIFIER); */ /* it actually will only return IDENTIFIER */ return(IDENTIFIER); }

15.What Does Following Define? Any Ideas? Why is it Listed Last on the Previous Slide? D [0-9] L [a-zA-Z _] { L}({L}|{D })* { count (); return( check_type ()); } int check_type () { /*pseudo code --- this is what it should check * if ( yytext == type_name ) * return(TYPE_NAME); * return(IDENTIFIER); */ /* it actually will only return IDENTIFIER */ return(IDENTIFIER); }

16.C Flex Continued … "..." { count(); return(ELLIPSIS); } ">>=" { count(); return(RIGHT_ASSIGN); } "<<=" { count(); return(LEFT_ASSIGN); } "+=" { count(); return(ADD_ASSIGN); } "-=" { count(); return(SUB_ASSIGN); } "*=" { count(); return(MUL_ASSIGN); } "/=" { count(); return(DIV_ASSIGN); } "%=" { count(); return(MOD_ASSIGN); } "&=" { count(); return(AND_ASSIGN); } "^=" { count(); return(XOR_ASSIGN); } "|=" { count(); return(OR_ASSIGN); } ">>" { count(); return(RIGHT_OP); } "<<" { count(); return(LEFT_OP); } "++" { count(); return(INC_OP); } "--" { count(); return(DEC_OP); } "->" { count(); return(PTR_OP); }

17.C Flex Continued … ("["|"<:") { count(); return([); } ("]"|":>") { count(); return(]); } "." { count(); return(.); } "&" { count(); return(

18.Remaining Code yywrap () { return(1 ); } comment () /* WHAT DOES THIS DO? */ { char c, c1 ; loop: while ((c = input()) != * && c != 0) putchar (c ); if ((c1 = input()) != / && c != 0) { unput (c1); goto loop; } if (c != 0) putchar (c1); }

19.Remaining Code int column = 0 ; void count() { int i ; for ( i = 0; yytext [ i ] != 0; i ++) if ( yytext [ i ] == n) column = 0; else if ( yytext [ i ] == t) column += 8 - (column % 8); else column ++; ECHO; } int check_type () { return(IDENTIFIER); }

20.Yacc - Yet Another Compiler Compiler https://www.slideshare.net/kinnarshah8888/ch4c Also from early 1970s A Compiler that Takes as Source a Specification for: Organization of Tokens into Grammar Rules Generates a LALR(1) Parser Pictorially:

21.A Combined View http://slideplayer.com/slide/4937457/

22.C Bison Specification http://www.lysator.liu.se/c/ANSI-C-grammar-y.html %token IDENTIFIER CONSTANT STRING_LITERAL SIZEOF %token PTR_OP INC_OP DEC_OP LEFT_OP RIGHT_OP LE_OP %token GE_OP EQ_OP NE_OP %token AND_OP OR_OP MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN %token SUB_ASSIGN LEFT_ASSIGN RIGHT_ASSIGN AND_ASSIGN %token XOR_ASSIGN OR_ASSIGN TYPE_NAME ADD_ASSIGN %token TYPEDEF EXTERN STATIC AUTO REGISTER %token CHAR SHORT INT LONG SIGNED UNSIGNED FLOAT DOUBLE CONST %token VOLATILE VOID %token STRUCT UNION ENUM ELLIPSIS %token CASE DEFAULT IF ELSE SWITCH WHILE DO FOR GOTO CONTINUE %token BREAK RETURN %start translation_unit %%

23.C Bison Specification http://www.lysator.liu.se/c/ANSI-C-grammar-y.html %token IDENTIFIER CONSTANT STRING_LITERAL SIZEOF %token PTR_OP INC_OP DEC_OP LEFT_OP RIGHT_OP LE_OP %token GE_OP EQ_OP NE_OP %token AND_OP OR_OP MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN %token SUB_ASSIGN LEFT_ASSIGN RIGHT_ASSIGN AND_ASSIGN %token XOR_ASSIGN OR_ASSIGN TYPE_NAME ADD_ASSIGN %token TYPEDEF EXTERN STATIC AUTO REGISTER %token CHAR SHORT INT LONG SIGNED UNSIGNED FLOAT DOUBLE CONST %token VOLATILE VOID %token STRUCT UNION ENUM ELLIPSIS %token CASE DEFAULT IF ELSE SWITCH WHILE DO FOR GOTO CONTINUE %token BREAK RETURN %start translation_unit %%

24.Bison Spec Continued … declaration_list : declaration | declaration_list declaration declaration : declaration_specifiers

25.Bison Spec Continued … statement_list : statement | statement_list statement ; statement : labeled_statement | compound_statement | expression_statement | selection_statement | iteration_statement | jump_statement ; labeled_statement : IDENTIFIER : statement | CASE constant_expression : statement | DEFAULT : statement ; List of Statements and Different Statements

26.Bison Spec Continued … statement_list : statement | statement_list statement ; statement : labeled_statement | compound_statement | expression_statement | selection_statement | iteration_statement | jump_statement ; labeled_statement : IDENTIFIER : statement | CASE constant_expression : statement | DEFAULT : statement ; List of Statements and Different Statements

27.Reviewing Compilation Process in C Current Programmers Use State of the Art IDEs IDEs focus on Higher Level Concepts Syntax Correction Capabilities Matching Parens and Brackets Sophisticated Compilation Error Messages Debugging Environment Process Hides Multi-Stage Compilation Process from Source Code to Preprocessed Code to Assembly Code to Object Code to Executable Preprocess: gcc –E main.c Generates main.e Assembly: gcc –S main.c Generates main.s Object: gcc –c main.c Generates main.o Executable: gcc main.c Generates a.out

28.Recall main.c #include < stdio.h > #include < ctype.h > #include < string.h > main() { char day[10]; int m, d, y; m = 9; d = 7; y = 2011; strcpy (day, "Wednesday"); printf ("Today is: "); printf ("%s %d/%d/%d

29.From Source to Preprocessing: What Does main.e Contain? # 1 "main.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "main.c" # 1 "/usr/include/stdio.h" 1 3 4 # 28 "/usr/include/stdio.h" 3 4 # 1 "/usr/include/features.h" 1 3 4 # 330 "/usr/include/features.h" 3 4 # 1 "/usr/include/sys/cdefs.h" 1 3 4 # 348 "/usr/include/sys/cdefs.h" 3 4 # 1 "/usr/include/bits/wordsize.h" 1 3 4 # 349 "/usr/include/sys/cdefs.h" 2 3 4 # 331 "/usr/include/features.h" 2 3 4 # 354 "/usr/include/features.h" 3 4 # 1 "/usr/include/gnu/stubs.h" 1 3 4 … etc ….