操纵器速度

本章节就关于操纵器的方面进行了介绍,介绍了操纵器告诉中央数据通道如何执行每个指令,介绍了指令时序由指令复杂性、体系结构、技术设置所构成,对这三个方面一一举例,以及如何根据不同的目标采取不同的措施,来进行业绩评估。
展开查看详情

1.CS 61C: Great Ideas in Computer Architecture Lecture 12: Control & Operating Speed Krste Asanović & Randy Katz http:// inst.eecs.berkeley.edu /~ cs61c/fa17

2.Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 2

3.Recap: Adding branches to datapath CS 61c 3 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb

4.Implementing JALR Instruction (I-Format) JALR rd , rs , immediate Writes PC+4 to Reg [ rd ] (return address) Sets PC = Reg [rs1] + immediate Uses same immediates as arithmetic and loads no multiplication by 2 bytes 4

5.Adding jalr to datapath CS 61c 5 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb

6.Adding jalr to datapath CS 61c 6 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =I RegWEn =1 BrUn =* BrEq =* BrLT =* Asel =0 Bsel =1 ALUSel =Add MemRW =Read WBSel =2 PCSel wb

7.Implementing jal Instruction JAL saves PC+4 in Reg [ rd ] (the return address) Set PC = PC + offset (PC-relative jump) Target somewhere within ±2 19 locations, 2 bytes apart ± 2 18 32-bit instructions Immediate encoding optimized similarly to branch instruction to reduce hardware cost 7

8.Adding jal to datapath CS 61c 8 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb

9.Adding jal to datapath CS 61c 9 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =J RegWEn =1 BrUn =* BrEq =* BrLT =* Asel =1 Bsel =1 ALUSel =Add MemRW =Read WBSel =2 PCSel wb

10.“Upper Immediate” instructions Has 20-bit immediate in upper 20 bits of 32-bit instruction word One destination register, rd Used for two instructions LUI – Load Upper Immediate (add to zero) AUIPC – Add Upper Immediate to PC 10

11.Implementing lui CS 61c 11 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =U RegWEn =1 BrUn =* BrE =* BrLT =* Asel =* Bsel =1 ALUSel =B MemRW =Read WBSel =1 PCSel =pc+4 wb pc

12.Implementing auipc CS 61c 12 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =U RegWEn =1 BrUn =* BrE =* BrLT =* Asel =1 Bsel =1 ALUSel =Add MemRW =0 WBSel =1 PCSel =pc+4 wb pc

13.Recap: Complete RV32I ISA 13 Not in CS61C RV32I has 47 instructions total 37 instructions covered in CS61C

14.Single-Cycle RISC-V RV32I Datapath CS 61c 14 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb

15.Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 15

16.Processor CS 61c Lecture 12: Control & Performance 16 Processor Control Datapath PC Registers Arithmetic & Logic Unit (ALU) Memory Bytes Enable? Read/Write Address Write Data Read Data Processor-Memory Interface Program Data

17.Single-Cycle RISC-V RV32I Datapath CS 61c 17 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] Control Logic inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb

18.Control Logic Truth Table (incomplete) CS 61c Lecture 12: Control & Performance 18 Inst [31:0] BrEq BrLT PCSel ImmSel BrUn ASel BSel ALUSel MemRW RegWEn WBSel add * * +4 * * Reg Reg Add Read 1 ALU sub * * +4 * * Reg Reg Sub Read 1 ALU (R-R Op) * * +4 * * Reg Reg (Op) Read 1 ALU addi * * +4 I * Reg Imm Add Read 1 ALU lw * * +4 I * Reg Imm Add Read 1 Mem sw * * +4 S * Reg Imm Add Write 0 * beq 0 * +4 B * PC Imm Add Read 0 * beq 1 * ALU B * PC Imm Add Read 0 * bne 0 * ALU B * PC Imm Add Read 0 * bne 1 * +4 B * PC Imm Add Read 0 * blt * 1 ALU B 0 PC Imm Add Read 0 * bltu * 1 ALU B 1 PC Imm Add Read 0 * jalr * * ALU I * Reg Imm Add Read 1 PC+4 jal * * ALU J * PC Imm Add Read 1 PC+4 auipc * * +4 U * PC Imm Add Read 1 ALU

19.Control Realization Options ROM “Read-Only Memory” Regular structure Can be easily reprogrammed fix errors add instructions Popular when designing control logic manually Combinatorial Logic Today, chip designers use logic synthesis tools to convert truth tables to networks of gates CS 61c Lecture 12: Control & Performance 19

20.RV32I, a nine-bit ISA! 20 Not in CS61C Instruction type encoded using only 9 bits inst [30], inst [14:12], inst [6:2] inst [30] inst [14:12] inst [6:2]

21.ROM-based Control CS 61c Lecture 12: Control & Performance 21 ROM Inst [30,14:12,6:2] BrEq 9 PCSel ALUSel [3:0] 4 11-bit address (inputs) 15 data bits (outputs) BrLT ImmSel [2:0] 3 BrUn ASel B Sel MemRW RegWEn WBSel [1:0] 2

22.ROM Controller Implementation CS 61c Lecture 12: Control & Performance 22 Control Word for add Control Word for sub Control Word for or . . . Address Decoder . . . Inst [] BrEQ BrLT Controller output ( PCSel , ImmSel , …) add sub or jal 11

23.Administrivia Homework 2 Due tomorrow 11:59 pm Project 1 Part 1 Due Monday Oct. 9 Part 2 due Monday Oct. 16 Midterm 1 Regrades due next Tuesday Talk to a TA if you don’t understand a midterm question or are unsure of a regrade CS 61c Lecture 12: Control & Performance 23

24.Break! 10/5/17 24

25.Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 25

26.Instruction Timing IF ID EX MEM WB Total I-MEM Reg Read ALU D-MEM Reg W 200 ps 100 ps 200 ps 200 ps 100 ps 800 ps CS 61c Lecture 12: Control & Performance 26

27.Instruction Timing Maximum clock frequency f max = 1/800ps = 1.25 GHz Most blocks idle most of the time E.g. f max,ALU = 1/200ps = 5 GHz! How can we keep ALU busy all the time? 5 billion adds/sec, rather than just 1.25 billion? Idea: Factories use three employee shifts - equipment is always busy! Instr IF = 200ps ID = 100ps ALU = 200ps MEM=200ps WB = 100ps Total add X X X X 600ps beq X X X 500ps jal X X X 500ps lw X X X X X 800ps sw X X X X 700ps

28.Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 28

29.Performance Measures “Our” RISC-V executes instructions at 1.25 GHz 1 instruction every 800 ps Can we improve its performance? What do we mean with this statement? Not so obvious: Quicker response time, so one job finishes faster? More jobs per unit time (e.g. web server returning pages)? Longer battery life? CS 61c Lecture 12: Control & Performance 29