RISC-V处理器的数据通道

本章节就RISCV的数据通道进行了介绍,datapath指的是能够在一个周期内执行所有的RISC-V指令,而不是所有指令使用的所有单元(硬件),介绍了五种数据通道的执行阶段以及介绍了控制器指定如何执行指令,什么指令可以添加与大多数控制。
展开查看详情

1.CS 61C: Great Ideas in Computer Architecture Lecture 11: RISC-V Processor Datapath Krste Asanović & Randy Katz http:// inst.eecs.berkeley.edu /~ cs61c/fa17

2.Recap: Complete RV32I ISA 2 Not in CS61C

3.State Required by RV32I ISA Each instruction reads and updates this state during execution: Registers ( x0.. x31 ) Register file (or regfile ) Reg holds 32 registers x 32 bits/ register: Reg [0].. Reg [31] First register read specified by rs1 field in instruction Second register read specified by rs2 field in instruction Write register (destination) specified by rd field in instruction x0 is always 0 (writes to Reg [ 0] are ignored) Program Counter ( PC ) Holds address of current instruction Memory ( MEM ) Holds both instructions & data, in one 32-bit byte-addressed memory space We’ll use separate memories for instructions ( IMEM ) and data ( DMEM ) Later we’ll replace these with instruction and data caches Instructions are read ( fetched ) from instruction memory (assume IMEM read-only) Load/store instructions access data memory 10/3/17 3

4.One-Instruction-Per-Cycle RISC-V Machine On every tick of the clock, the computer executes one instruction C urrent state outputs drive the inputs to the combinational logic, whose outputs settles at the values of the state before the next clock edge At the rising clock edge, all the state elements are updated with the combinational logic outputs, and execution moves to the next clock cycle CS 61c 4 Reg [] pc IMEM D MEM Combinational Logic clock

5.Basic Phases of Instruction Execution IMEM +4 rs2 rs1 rd Reg [] ALU DMEM imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Register Write PC 10/3/17 5 mux Clock time

6.Implementing the add instruction add rd , rs1, rs2 Instruction makes two changes to machine’s state: Reg [ rd ] = Reg [rs1] + Reg [rs2] PC = PC + 4 CS 61c 6

7.Control Logic Datapath for add CS 61c 7 +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] RegWriteEnable ( RegWEn ) Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] + alu

8.Timing Diagram for add 8 1000 1004 PC 1004 1008 PC+4 add x1,x2,x3 add x6 ,x7,x9 inst [31:0] Clock time +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] + RegWEn Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] clock alu Reg [2] Reg [7] Reg [rs1] Reg [2]+ Reg [3] alu Reg [7]+ Reg [9] Reg [3] Reg [9] Reg [rs2] ??? Reg [1] Reg [2]+ Reg [3]

9.Implementing the sub instruction sub rd , rs1, rs2 Almost the same as add, except now have to subtract operands instead of adding them inst [30] selects between add and subtract CS 61c 9

10.Control Logic D atapath for add/sub CS 61c 10 +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] RegWEn (1=write, 0=no write) Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] alu ALU ALUSel (Add=0/Sub=1)

11.Implementing other R-Format instructions All implemented by decoding funct3 and funct7 fields and selecting appropriate A LU function CS 61c 11

12.Implementing the addi instruction RISC-V Assembly Instruction: addi x15,x1,-50 12 10/3/17 111111001110 00001 000 01111 0010011 OP- Imm r d =15 ADD i mm =-50 rs1=1

13.Control Logic D atapath for add/sub CS 61c 13 +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] RegWEn (1=write, 0=no write) Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] alu ALU ALUSel (Add=0/Sub=1)

14.Control Logic Adding addi to datapath CS 61c 14 +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] alu ALU ALUSel =Add Imm . Gen 0 1 RegWEn =1 inst [31:20] imm [31:0] ImmSel =I BSel =1

15.I-Format immediates CS 61c 15 inst [31:0] ------ inst [31]-(sign-extension)------- inst [30:20] imm [31:0] Imm . Gen inst [31:20] imm [31:0] ImmSel =I H igh 12 bits of instruction ( inst [31:20 ]) copied to low 12 bits of immediate ( imm [11:0] ) Immediate is sign-extended by copying value of inst [31] to fill the upper 20 bits of the immediate value ( imm [31:12])

16.Control Logic Adding addi to datapath CS 61c 16 +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] alu ALU ALUSel =Add Imm . Gen 0 1 RegWEn =1 inst [31:20] imm [31:0] ImmSel =I BSel =1 Also works for all other I-format arithmetic instruction ( slti,sltiu,andi,ori,xori,slli,srli,srai ) just by changing ALUSel

17.TSMC Announces 3nm CMOS Fab CS 61c 17 Latest Apple iPhone 8, iPhone X use TSMC’s 10nm process technology. 3nm technology should allow 10x more stuff on the same sized chip (10/3) 2 The new manufacturing plant will occupy nearly 200 acres and cost around $15B, open in around 5 years (~2022). Currently, fabs use 193nm light to expose masks For 3nm, some layers will use Extreme Ultra-Violet (13.5nm)

18.Break! 10/3/17 18

19.Implementing Load Word instruction RISC-V Assembly Instruction: lw x14, 8(x2) 19 10/3/17 000000001000 00010 010 01110 0000011 LOAD r d =14 LW i mm =+8 rs1=2

20.Control Logic Adding addi to datapath CS 61c 20 +4 pc pc+4 inst [11:7] inst [19:15] inst [24:20] IMEM inst [31:0] Reg [] AddrA AddrB DataA AddrD DataB DataD Reg [rs1] Reg [rs2] alu ALU ALUSel =Add Imm . Gen 0 1 RegWEn =1 inst [31:20] imm [31:0] ImmSel =I BSel =1

21.Adding lw to datapath CS 61c 21 IMEM ALU Imm . Gen +4 D MEM Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataR 0 1 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:20] alu mem wb pc+4 Reg [rs1] imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BSel ALUSel MemRW WBSel wb

22.Adding lw to datapath CS 61c 22 IMEM ALU Imm . Gen +4 D MEM Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataR 0 1 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:20] alu mem wb pc+4 Reg [rs1] imm [31:0] Reg [rs2] inst [31:0] ImmSel =I RegWEn =1 Bsel =1 ALUSel =Add MemRW =Read WBSel =0 wb

23.All RV32 Load Instructions Supporting the narrower loads requires additional circuits to extract the correct byte/ halfword from the value loaded from memory, and sign- or zero-extend the result to 32 bits before writing back to register file. 23 funct3 field encodes size and signedness of load data

24.Implementing Store Word instruction RISC-V Assembly Instruction: s w x14, 8(x2) 24 10/3/17 0000000 01110 00010 010 01000 0100011 STORE offset[4:0] =8 SW offset[11:5] =0 r s2=14 rs1=2 combined 12-bit offset = 8 0000000 01000

25.Adding lw to datapath CS 61c 25 IMEM ALU Imm . Gen +4 D MEM Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataR 0 1 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:20] alu mem wb pc+4 Reg [rs1] imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BSel ALUSel MemRW WBSel wb

26.Adding s w to datapath CS 61c 26 IMEM ALU Imm . Gen +4 D MEM Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 0 1 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] alu mem wb pc+4 Reg [rs1] imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn Bsel ALUSel MemRW WBSel = wb

27.Adding s w to datapath CS 61c 27 IMEM ALU Imm . Gen +4 D MEM Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 0 1 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] alu mem wb pc+4 Reg [rs1] imm [31:0] Reg [rs2] inst [31:0] ImmSel =S RegWEn =0 Bsel =1 ALUSel =Add MemRW =Write WBSel =* wb *= “Don’t Care”

28.I-Format immediates CS 61c 28 inst [31:0] ------ inst [31]-(sign-extension)------- inst [30:20] imm [31:0] Imm . Gen inst [31:20] imm [31:0] ImmSel =I H igh 12 bits of instruction ( inst [31:20 ]) copied to low 12 bits of immediate ( imm [11:0] ) Immediate is sign-extended by copying value of inst [31] to fill the upper 20 bits of the immediate value ( imm [31:12])

29.I & S Immediate Generator CS 61c 29 imm [11:5] rs2 rs1 funct3 imm [4:0] S- opcode imm [11:0] rs1 funct 3 rd I- opcode inst [31](sign-extension) inst [30:25] imm [31:0] inst [31:0] inst [24:20] S I inst [31](sign-extension) inst [30:25] inst [11:7] 0 6 7 11 12 14 15 19 20 24 25 31 0 4 5 10 11 3 1 1 6 5 5 S I Just need a 5-bit mux to select between two positions where low five bits of immediate can reside in instruction Other bits in immediate are wired to fixed positions in instruction