指令集架构

本章主要内容为学习指令集架构,包括了指令集设计,指令集设计的基本问题,指令的执行周期等。介绍了指令集所要涵盖的主题。讨论以下决定了IS的复杂性的因素。包括了指令格式或编码、数据类型和大小、后继指令-流量控制、运算等。
展开查看详情

1. ECE 4680: Computer Architecture and Organization Instruction Set Architecture Different styles of ISA. Basic issues when designing an ISA. What a good ISA should be? ECE4680 Lec 3 ISA.1 February 6, 2002 Instruction Set Design software instruction set hardware An instruction is a binary code, which specifies a basic operation (e.g. add, subtract, and, or) for the computer • Operation Code: defines the operation type • Operands: operation source and destination ECE4680 Lec 3 ISA.2 February 6, 2002

2. Instruction Set Architecture Programmer's View Computer Program ADD 01010 (Instructions) SUBTRACT 01110 AND 10011 CPU OR 10001 Memory COMPARE 11010 . . . . I/O . . Computer's View Princeton (Von Neumann) Architecture Harvard Architecture --- Data and Instructions mixed in same --- Data & Instructions in memory ("stored program computer") separate memories --- Program as data (dubious advantage) --- Has advantages in certain --- Storage utilization high performance imple- --- Single memory interface mentations ECE4680 Lec 3 ISA.3 February 6, 2002 Basic Issues in Instruction Set Design --- What operations (and how many) should be provided LD/ST/INC/BRN sufficient to encode any computation But not useful because programs too long! --- How (and how many) operands are specified Most operations are dyadic (eg, A <- B + C) Some are monadic (eg, A <- ~B) --- How to encode these into consistent instruction formats Instructions should be multiples of basic data/address widths Typical instruction set: ° 32 bit word ° basic operand addresses are 32 bits long ° basic operands, like integers, are 32 bits long ° in general case, instruction could reference 3 operands (A := B + C) challenge: encode operations in a small number of bits! ECE4680 Lec 3 ISA.4 February 6, 2002

3. Execution Cycle Instruction Obtain instruction from program storage Fetch Instruction Determine required actions and instruction size Decode Operand Locate and obtain operand data Fetch Execute Compute result value or status Result Deposit results in storage for later use Store Next Determine successor instruction Instruction ECE4680 Lec 3 ISA.5 February 6, 2002 What Must be Specified? Instruction ° Instruction Format or Encoding Fetch – how is it decoded? ° Data type and Size Instruction – what are supported Decode ° Location of operands and result – addressing mode – where other than memory? Operand – how many explicit operands? Fetch – how are memory operands located? Execute – which can or cannot be in memory? ° Operations Result – what are supported Store ° Successor instruction – flow control – jumps, conditions, branches Next Instruction - fetch-decode-execute is implicit! ECE4680 Lec 3 ISA.6 February 6, 2002

4. Topics to be covered We will discuss the following topics which determines the Complexity of IS. ° Instruction Format or Encoding – how is it decoded? ° Data type and Size – what are supported ° Location of operands and result – addressing mode – where other than memory? – how many explicit operands? – how are memory operands located? – which can or cannot be in memory? ° Operations – what are supported ° Successor instruction – flow control – jumps, conditions, branches ECE4680 Lec 3 ISA.7 February 6, 2002 Basic ISA Classes Accumulator: (earliest machines) 1 address add A acc ← acc + mem[A] 1+x address addx A acc ← acc + mem[A + x] Stack: (HP calculator, Java virtual machines) 0 address add tos ← tos + next General Purpose Register: (e.g. Intel 80x86, Motorola 68xxx) 2 address add A B EA(A) ← EA(A) + EA(B) 3 address add A B C EA(A) ← EA(B) + EA(C) Load/Store: (e.g. SPARC, MIPS, PowerPC) 3 address add Ra Rb Rc Ra ← Rb + Rc load Ra Rb Ra ← mem[Rb] store Ra Rb mem[Rb] ← Ra Comparison: Bytes per instruction? Number of Instructions? Cycles per instruction? ECE4680 Lec 3 ISA.8 February 6, 2002

5. Comparing Instructions Comparing Number of Instructions ° Code sequence for C = A + B for four classes of instruction sets: Stack Accumulator Register Register (register-memory) (load-store) Push A Load A Load R1,A Load R1,A Push B Add B Add R1,B Load R2,B Add Store C Store C, R1 Add R3,R1,R2 Pop C Store C,R3 ECE4680 Lec 3 ISA.9 February 6, 2002 General Purpose Registers Dominate Since 1975 all machines use general purpose registers °( Java Virtual Machine adopts Stack architecture ) °Advantages of registers • registers are faster than memory • registers are easier for a compiler to use - e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs. stack • registers can hold variables - memory traffic is reduced, so program is sped up (since registers are faster than memory) - code density improves (since register named with fewer bits than memory location) ECE4680 Lec 3 ISA.10 February 6, 2002

6. More About register? MBus Module SuperSPARC Processor Registers Datapath integrated together in process chip. Internal Control Cache In top level in memory hierarchy. External Cache Faster to access and simpler to use. Special role in MIPS ISA: only registers not symbolic variables can be In instructions. The number of registers can not be too more, not be too less. Effective use of registers is a key to program performance. ECE4680 Lec 3 ISA.11 February 6, 2002 Examples of Register Usage Number of memory addresses per typical ALU instruction Maximum number of operands per typical ALU instruction Examples 0 3 SPARC, MIPS, Precision Architecture, Power PC 1 2 Intel 80x86, Motorola 68000 2 2 VAX (also has 3-operand formats) 3 3 VAX (also has 2-operand formats) ECE4680 Lec 3 ISA.12 February 6, 2002

7. Example: In VAX: ADDL (R9), (R10), (R11) mem[R9] <-- mem[R10] + mem[R11] In MIPS: lw R1, (R10); load a word lw R2, (R11) add R3, R1, R2; R3 <-- R1+R2 sw R3, (R9); store a word ECE4680 Lec 3 ISA.13 February 6, 2002 Pros and Cons of Number of Memory Operands/Operands Register-register: 0 memory operands/instr, 3 (register) operands/instr + Simple, fixed-length instruction encoding. Simple code generation model. Instructions take similar numbers of clocks to execute -Higher instruction count than architectures with memory references in instructions. Some instructions are short and bit encoding may be wasteful. Register-memory (1,2) +Data can be accessed without loading first. Instruction format tends to be easy to encode and yields good density. -Operands are not equivalent since a source operand in a binary operation is destroyed. Encoding a register number and a memory Address  in each instruction may restrict the number of registers. Clocks per instruction varies by operand location. Memory-memory (3,3) +Most compact. Doesn't waste registers for temporaries. -Large variation in instruction size, especially for three-operand instructions. Also, large variation in work per instruction. Memory accesses create memory bottleneck. ECE4680 Lec 3 ISA.14 February 6, 2002

8. Summary on Instruction Classes ° Expect new instructin set architecture to use general purpose register ° Pipelining => Expect it to use load store variant of GPR ISA ECE4680 Lec 3 ISA.15 February 6, 2002 Memory addressing • BYTE Addressing: – Since 1980, almost every machine uses addresses to level of 8-bits • Two Questions for design of ISA – For a 32-bit word, read it as four loads of bytes from sequential byte addresses or as one load work from a single byte address. How byte address map onto words ? – Can a word be placed on any byte boundary? ECE4680 Lec 3 ISA.16 February 6, 2002

9. Addressing Objects Big Endian: address of most significant IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA Little Endian: address of least significant Intel 80x86, DEC Vax 3 2 1 0 little endian word 0: Word: msb lsb 0 1 2 3 big endian word 0: Alignment: require that objects fall on address that is multiple of their size. (p 112) ECE4680 Lec 3 ISA.17 February 6, 2002 BIG Endian versus Little Endian (P113 & A-46) Example 1: Memory layout of a number #ABCD In Big Endian: CD $1001 AB $1000 In Little Endian: AB $1001 CD $1000 Example 2: Memory layout of a number #FF00 ECE4680 Lec 3 ISA.18 February 6, 2002

10. Byte Swap Problem GH 3 AB 3 EF 2 CD 2 CD 1 increasing EF 1 byte AB 0 address GH 0 Big Endian Little Endian Memory layout of a number of ABCDEFGH Each system is self-consistent, but causes problems when they need communicate! ECE4680 Lec 3 ISA.19 February 6, 2002 Addressing Modes Addressing mode Example Meaning Immediate Add R4,#3 R4 ← R4+3 Register Add R4,R3 R4 ← R4+R3 Register indirect Add R4,(R1) R4 ← R4+Mem[R1] Displacement Add R4,100(R1) R4 ← R4+Mem[100+R1] Indexed Add R3,(R1+R2) R3 ← R3+Mem[R1+R2] Direct or absolute Add R1,(1001) R1 ← R1+Mem[1001] Memory indirect Add R1,@(R3) R1 ← R1+Mem[Mem[R3]] Auto-increment Add R1,(R2)+ R1 ← R1+Mem[R2]; R2 ← R2+d Auto-decrement Add R1,–(R2) R2 ← R2–d; R1 ← R1+Mem[R2] Scaled Add R1,100(R2)[R3] R1 ← R1+Mem[100+R2+R3*d] ECE4680 Lec 3 ISA.20 February 6, 2002

11. Addressing Mode: • Addressing modes have the ability to significantly reduce instruction counts • They also add to the complexity of building a machine ECE4680 Lec 3 ISA.21 February 6, 2002 Addressing Mode Usage 3 programs avg, 17% to 43% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% ECE4680 Lec 3 ISA.22 February 6, 2002

12. Displacement Address Size Int. Avg. FP Avg. 30% 25% 20% 15% 10% 5% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Average of 5 programs from SPECint92 and Average of 5 programs from SPECfp92 3 4 X-axis is in powers of 2: => addresses > 2 (8) and < 2 (16) 1% of addresses > 16-bits ECE4680 Lec 3 ISA.23 February 6, 2002 Immediate Size • 50% to 60% fit within 8 bits • 75% to 80% fit within 16 bits ECE4680 Lec 3 ISA.24 February 6, 2002

13. Addressing Summary •Data Addressing modes that are important: Displacement, Immediate, Register Indirect •Displacement size should be 12 to 16 bits •Immediate size should be 8 to 16 bits ECE4680 Lec 3 ISA.25 February 6, 2002 Typical Operations Data Movement Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Logical not, and, or, set, clear Shift shift left/right, rotate left/right Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return Interrupt trap, return Synchronization test & set (atomic r-m-w) String search, translate ECE4680 Lec 3 ISA.26 February 6, 2002

14. Top 10 80x86 Instructions ° Rank instruction Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% ° Simple instructions dominate instruction frequency ECE4680 Lec 3 ISA.27 February 6, 2002 Methods of Testing Condition ° Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. ex: add r1, r2, r3 bz label ° Condition Register Ex: cmp r1, r2, r3; compare r2 with r3, 0 or 1 is stored in r1 bgt r1, label; branch on greater ° Compare and Branch Ex: bgt r1, r2, label; if r1 > r2, then go to label ECE4680 Lec 3 ISA.28 February 6, 2002

15. Condition Codes Setting CC as side effect can reduce the # of instructions X: . X: . . . . versus . SUB r0, #1, r0 SUB r0, #1, r0 BRP X CMP r0, #0 BRP X But also has disadvantages: --- not all instructions set the condition codes which do and which do not often confusing! e.g., shift instruction sets the carry bit --- dependency between the instruction that sets the CC and the one that tests it: to overlap their execution, may need to separate them with an instruction that does not change the CC ifetch read compute write Old CC read New CC computed ifetch read compute write ECE4680 Lec 3 ISA.29 February 6, 2002 Branches --- Conditional control transfers Four basic conditions: N -- negative V -- overflow Z -- zero C -- carry Sixteen combinations of the basic four conditions: Always Unconditional Never NOP Not Equal ~Z Equal Z Greater ~[Z + (N ⊗V)] Less or Equal Z + (N ⊗ V) Greater or Equal ~(N ⊗ V) Less N⊗ V Greater Unsigned ~(C + Z) Less or Equal Unsigned C+Z Carry Clear ~C Carry Set C Positive ~N Negative N Overflow Clear ~V Overflow Set V ECE4680 Lec 3 ISA.30 February 6, 2002

16. Conditional Branch Distance Int. Avg. FP Avg. 40% 35% 30% 25% 20% 15% 10% 5% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Bits of Branch Dispalcement • Distance from branch in instructions 2i => Š ±2i-1 • 25% of integer branches are > 22 ECE4680 Lec 3 ISA.31 February 6, 2002 Conditional Branch Addressing • PC-relative since most branches at least 8 bits suggested (± ± 128 instructions) • Compare Equal/Not Equal most important for integer programs 7% LT/GE 40% 7% Int Avg. GT/LE 23% FP Avg. 86% EQ/NE 37% 0% 20% 40% 60% 80% 100% Frequency of comparison types in branches ECE4680 Lec 3 ISA.32 February 6, 2002

17. Operation Summary • Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, return; ECE4680 Lec 3 ISA.33 February 6, 2002 Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word Character: ASCII 7 bit code EBCDIC 8 bit code (IBM) UNICODE 16 bit code (Java) Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Positive #'s same in all Integers: First 2 have two zeros Sign & Magnitude: 0X vs. 1X Last one usually chosen 1's Complement: 0X vs. 1(~X) 2's Complement: 0X vs. (1's comp) + 1 exponent E How many +/- #'s? Floating Point: MxR Where is decimal pt? Single Precision How are +/- exponents Double Precision base represented? Extended Precision mantissa ECE4680 Lec 3 ISA.34 February 6, 2002

18. Operand Size Usage 0% Doubleword 69% 74% Word 31% Int Avg. FP Avg. 19% Halfword 0% 7% Byte 0% 0% 20% 40% 60% 80% Frequency of reference by size •Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers ECE4680 Lec 3 ISA.35 February 6, 2002 Instruction Format • If have many memory operands per instructions and many addressing modes, need an Address Specifier per operand • If have load-store machine with 1 address per instr. and one or two addressing modes, then just encode addressing mode in the opcode ECE4680 Lec 3 ISA.36 February 6, 2002

19. Generic Examples of Instruction Formats Variable: … Fixed: Hybrid: ECE4680 Lec 3 ISA.37 February 6, 2002 Summary of Instruction Formats • If code size is most important, use variable length instructions •If performance is most important, use fixed length instructions ECE4680 Lec 3 ISA.38 February 6, 2002

20. Instruction Set Metrics Design-time metrics: ° Can it be implemented, in how long, at what cost? ° Can it be programmed? Ease of compilation? Static Metrics: ° How many bytes does the program occupy in memory? Dynamic Metrics: ° How many instructions are executed? ° How many bytes does the processor fetch to execute the program? ° How many clocks are required per instruction? CPI ° How "lean" a clock is practical? Best Metric: Time to execute the program! Inst. Count Cycle Time NOTE: this depends on instructions set, processor organization, and compilation techniques. ECE4680 Lec 3 ISA.39 February 6, 2002 Lecture Summary: ISA ° Use general purpose registers with a load-store architecture; ° Support these addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; ° Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register- register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return; ° Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 64- bit IEEE 754 floating point numbers; ° Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size; ° Provide at least 16 general purpose registers plus separate floating- point registers, be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set. ECE4680 Lec 3 ISA.40 February 6, 2002