OS’s Responsibilities


1. ECE468 Computer Organization and Architecture OS’s Responsibilities ECE4680 buses.1 April 5, 2003 Recap: Summary of Bus Options: °Option High performance Low cost °Bus width Separate address Multiplex address & data lines & data lines °Data width Wider is faster Narrower is cheaper (e.g., 32 bits) (e.g., 8 bits) °Transfer size Multiple words has Single-word transfer less bus overhead is simpler °Bus masters Multiple Single master (requires arbitration) (no arbitration) °Clocking Synchronous Asynchronous °Protocol pipelined Serial ECE4680 buses.2 April 5, 2003

2. I/O System Design Issues interrupts Processor Cache Memory - I/O Bus Main I/O I/O I/O Memory Controller Controller Controller Disk Disk Graphics Network ECE4680 buses.3 April 5, 2003 Operating System Requirements(§8.5) °Provide protection to shared I/O resources • Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights °Provides abstraction for accessing devices: • Supply routines that handle low-level device operation °Handles the interrupts generated by I/O devices °Provide equitable access to the shared I/O resources • All user programs must have equal access to the I/O resources °Schedule accesses in order to enhance system throughput ECE4680 buses.4 April 5, 2003

3. OS and I/O Systems Communication Requirements °The Operating System must be able to prevent: • The user program from communicating with the I/O device directly °If user programs could perform I/O directly: • Protection to the shared I/O resources could not be provided °Three types of communication are required: • The OS must be able to give commands to the I/O devices • The I/O device must be able to notify the OS when the I/O device has completed an operation or has encountered an error • Data must be transferred between memory and an I/O device ECE4680 buses.5 April 5, 2003 Giving Commands to I/O Devices °Two methods are used to address the device: • Special I/O instructions • Memory-mapped I/O °Special I/O instructions specify: • Both the device number and the command word - Device number: the processor communicates this via a set of wires normally included as part of the I/O bus - Command word: this is usually send on the bus’s data lines °Memory-mapped I/O: • Portions of the address space are assigned to I/O device • Read and writes to those addresses are interpreted as commands to the I/O devices • User programs are prevented from issuing I/O operations directly: - The I/O address space is protected by the address translation ECE4680 buses.6 April 5, 2003

4. I/O Device Notifying the OS °The OS needs to know when: • The I/O device has completed an operation • The I/O operation has encountered an error °This can be accomplished in two different ways: • Polling: - The I/O device put information in a status register - The OS periodically check the status register • I/O Interrupt: - Whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. ECE4680 buses.7 April 5, 2003 Polling: Programmed I/O CPU Is the data busy wait loop ready? not an efficient way to use the CPU unless the device Memory yes no is very fast! IOC read data but checks for I/O device completion can be dispersed among store computation data intensive code done? no °Advantage: yes • Simple: the processor is totally in control and does all the work °Disadvantage: • Polling overhead can consume a lot of CPU time ECE4680 buses.8 April 5, 2003

5. Example 1: Overhead of polling(page 676) °Assume that the number of clock cycles for a polling operation, including transferring to the polling routine, accessing the device, and restarting the user program, is 400 and that the processor executes with a 500MHz clock °Determine the fraction of CPU time consumed for the following three cases, assuming that you poll often enough so that no data is ever lost and assuming that the devices are potentially always busy: • mouse must be polled 30 times per second • floppy disk transfers data to CPU in 16-bit units and has a data rate of 50 KB/sec. • hard disk transfers data four-word chunks and can transfer at 4MB/sec. • For mouse, fraction = 30× ×400÷ ÷500MHz = 0.002% • For floppy disk, polling rate = 50KB/s ÷2=25K/s, fraction = 25k ×400÷ ÷500MHz =2% • For hard disk, polling rate = 4MB/s ÷16=250K/s, fraction = 250k ×400÷ ÷500MHz =20% • So for quicker I/O devices, polling wastes more CUP time. What is worse, even when I/O is not busy. CPU still keeps polling. ECE4680 buses.9 April 5, 2003 Interrupt Driven Data Transfer add CPU sub user (1) I/O and program interrupt or nop (2) save PC Memory IOC (3) interrupt service addr read store interrupt device ... : service (4) rti routine °Advantage: memory • User program progress is only halted during actual transfer °Disadvantage, special hardware is needed to: • Cause an interrupt (I/O device) • Detect an interrupt (processor) • Save the proper states to resume after the interrupt (processor) ECE4680 buses.10 April 5, 2003

6. I/O Interrupt(page 678) °An I/O interrupt is just like the exceptions except: • An I/O interrupt is asynchronous • Further information needs to be conveyed °An I/O interrupt is asynchronous with respect to instruction execution: • I/O interrupt is not associated with any instruction • I/O interrupt does not prevent any instruction from completion - You can pick your own convenient point to take an interrupt °I/O interrupt is more complicated than exception: • Needs to convey the identity of the device generating the interrupt • Interrupt requests can have different urgencies: - Interrupt request needs to be prioritized ECE4680 buses.11 April 5, 2003 Interrupt Logic °Detect and synchronize interrupt requests • Ignore interrupts that are disabled (masked off) • Rank the pending interrupt requests • Create interrupt microsequence address • Provide select signals for interrupt microsequence uSeq. Async Interrupt Synchronizer interrupt addr & : Priority Circuits requests select Network : logic Interrupt Mask Reg Sync. Async. Inputs Q D Q D Inputs Clk Clk ECE4680 buses.12 April 5, 2003

7. Program Interrupt/Exception Hardware °Hardware interrupt services: • Save the PC (or PCs in a pipelined machine) • Inhibit the interrupt that is being handled • Branch to interrupt service routine • Options: - Save status, save registers, save interrupt information - Change status, change operating modes, get interrupt info. °A “good thing” about interrupt: • Asynchronous: not associated with a particular instruction • Pick the most convenient place in the pipeline to handle it ECE4680 buses.13 April 5, 2003 Programmer’s View main program interrupts request (e.g., from keyboard) (1) Add Div (2) Save PC and “branch” to interrupt target address Sub Save processor status/state Service the (keyboard) interrupt Restore processor status/state (3) get PC °Interrupt target address options: • General: Branch to a common address for all interrupts Software then decode the cause and figure out what to do • Specific: Automatically branch to different addresses based on interrupt type and/or level--vectored interrupt ECE4680 buses.14 April 5, 2003

8. Example 2: Overhead of Interrupt (page 679) °Consider the same hard disk and processor in Example 1: • hard disk transfers data four-word chunks and can transfer at 4MB/sec. • the processor executes with a 500MHz clock °Assume that the overhead for each transfer, including the interrupt, is 500 clock cycles °Determine the fraction of CPU time consumed if the hard disk is only transferring data 5% of the time. • Interrupt rate is the same as polling rate = 4MB/s ÷16=250K/s, fraction when 100% busy = 250k ×500÷ ÷500MHz =25% fraction when 5% busy = 25%××5% = 1.25% • So interrupt-driven is much better than polling. Reason? ECE4680 buses.15 April 5, 2003 Delegating I/O Responsibility from the CPU: DMA CPU sends a starting address, direction, and length count to DMAC. Then issues "start". °Direct Memory Access (DMA): • External to the CPU CPU • Act as a maser on the bus • Transfer blocks of data to or from memory without CPU intervention Memory DMAC IOC device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. ECE4680 buses.16 April 5, 2003

9. Example 3: Overhead of DMA(page 681) °Consider the same hard disk and processor in Examples 1 and 2: • hard disk transfers at 4MB/sec. • the processor executes with a 500MHz clock °Assume the initial setup of a DMA transfer takes 1000 clock cycles °the handling of the interrupt at DMA completion requires 500 cycles °Determine the fraction of CPU time consumed if the hard disk is actively transferring 100% of the time and the average transfer from the disk is 8KB • DMA rate is = 4MB/s ÷ 8kB= 500/s fraction when 100% busy = (1000+500) ×500÷ ÷500MHz =0.2% • So MDA is even better than interrupt-driven. Reason? ECE4680 buses.17 April 5, 2003 Delegating I/O Responsibility from the CPU: IOP CPU IOP D1 main memory D2 Mem bus . . . Dn I/O target device bus where cmnds are OP Device Address (1) Issues CPU (4) IOP interrupts instruction CPU when done to IOP IOP IOP looks in memory for commands (2) (3) OP Addr Cnt Other memory what special Device to/from memory to do requests transfers are controlled by the IOP directly. where how to put much IOP steals memory cycles. data ECE4680 buses.18 April 5, 2003

10. Summary: °Three types of buses: • Processor-memory buses • I/O buses • Backplane buses °Bus arbitration schemes: • Daisy chain arbitration: it cannot assure fairness • Centralized parallel arbitration: requires a central arbiter °I/O device notifying the operating system: • Polling: it can waste a lot of processor time • I/O interrupt: similar to exception except it is asynchronous °Delegating I/O responsibility from the CPU • Direct memory access (DMA) • I/O processor (IOP) ECE4680 buses.19 April 5, 2003 ECE4680: Objectives and Assessment °In-depth understanding of the inner-workings of modern computers, their evolution, and trade-offs present at the hardware/software boundary. • Insight into fast/slow operations that are easy/hard to implementation hardware °Experience with the design process in the context of a large complex (hardware) design. • Functional Spec Control & Datapath Physical implementation ECE4680 buses.20 April 5, 2003

11. The Big Picture Processor Input Control Memory Datapath Output ECE4680 buses.21 April 5, 2003 Where are we visited? Input Input Multiplicand Multiplier 32 Multiplicand Register LoadMp Performance 32=>34 signEx <<1 34 32 Arithmetic Single/multicycle 34 32=>34 1 0 signEx 34x2 MUX evaluation Multi x2/x1 34 34 Datapaths 34-bit ALU Sub/Add Control Logic 34 [0]" 32 2 32 ShiftAll "LO ENC [2] LO[1] Encoder 2 HI register 2 LO register 2 bits Booth Extra ENC [1] Prev (16x2 bits) (16x2 bits) ENC [0] 2 LoadLO ClearHI LoadHI LO[1:0] 32 32 Result[HI] Result[LO] 1000 µProc CPU 60%/yr. “Moore’s Law” (2X/1.5 yr) 100 Processor-Memory ECE4680 Performance Gap: Performance (grows 50% / year) 10 DRAM IFetch Dcd Exec Mem WB Winter 2003 1 9%/yr. DRAM(2X/10 yrs) 198 1098 198 2198 198 4198 5198 6198 7198 8198 0199 1199 2199 3199 5199 6199 7199 8199 9200 199 1499 1 3 9 0 IFetch Dcd Exec Mem WB Time IFetch Dcd Exec Mem WB Dcd Exec Mem WB IFetch ℵ Pipelining I/O Memory Systems ECE4680 buses.22 April 5, 2003

12. Beyond ECE4680 Application Operating Compiler System Instruction Set Architecture Instr. Set Proc. I/O system Digital Design Circuit Design ECE4680 buses.23 April 5, 2003