A Major Overhaul of the APIC Initialization and Vector Allocati

Interrupt is one of the important mechanisms of the Linux kernel, vector and APIC tell the kernel how to operate the interrupt. With the development of the kernel, the old code leads to many problems, such as vector space exhaustion, vector allocation chaos, kdump failure, Timer setup error, etc., Recently, Thomas Gleixner and Dou Liyang conducted a major overhaul. In this presentation, Dou Liyang will describe the main process of interrupt initialization, discuss the challenges it faces, and introduce what does the overhaul do and explain how it may address those challenges.
展开查看详情

1.Kernel Interrupt: A Major Overhaul - APIC Initialization & - Vector Allocation Dou Liyang douly.fnst@cn.fujitsu.com June 20 2018 Copyright 2018 FUJITSU LIMITED

2.Outline Basics of an interrupt  Overhaul of interrupt What's next?  APIC Initialization  Vector Allocation  Future work 1 Copyright 2018 FUJITSU LIMITED

3.What is An Interrupt?  A hardware signal  Emitted from a peripheral to a CPU  Indicating that a device-specific condition has been satisfied device CPU From Marc Zyngier <marc.zyngier@arm.com> 2 Copyright 2018 FUJITSU LIMITED

4.Multiplexing Interrupts  Having a single interrupt for the CPU is usually not enough  Most systems have tens, hundreds of them  An interrupt controller allows them to be multiplexed  Very often architecture or platform specific  In old x86 machine, there was a PIC called 8259A  a chip responsible for sequentially processing multiple interrupt requests from multiple devices  Called PIC Mode device PIC CPU device 3 Copyright 2018 FUJITSU LIMITED

5.Multiplexing Interrupts in SMP System  Only a CPU is usually not enough  Most systems have tens, hundreds of CPUs  An new interrupt controller should be used  In x86 machine, there is an APIC  Local APIC is located on each CPU core, handles the CPU-specific interrupt configuration  I/O APIC distribute external interrupts from multiple devices to multiple CPU cores  Called Symmetric I/O Mode device Local APIC CPU I/O APIC device Local APIC CPU 4 Copyright 2018 FUJITSU LIMITED

6.More than wired interrupts: MSIs  Message Signaled Interrupts are as an alternative to line-based interrupts  Trigger an interrupt by writing a value to a particular memory  Allow the use of the same buses as the data device CPU device MSI Capabilities CPU device CPU 5 Copyright 2018 FUJITSU LIMITED

7.Handle an Interrupt  Preempt current task ① ②  Pause execution of the current process.  Execute interrupt handler ③ ~ ⑤  Search for the handler of the interrupt and transfer control  Resume the task ⑥  Return to execute the current process; ① ② CPU Current Task ⑥ ③ ⑤ Has an Interrupt ④ Interrupt Handler 6 Copyright 2018 FUJITSU LIMITED

8.How Does “Handle an Interrupt” Work?  APIC and Vector mechanism make it work Interrupt 1. Delivery the IRQ through the APIC 2. CPU search the handler in IDT through the vector 3. Get the irq_desc structure through the vector. 1 APIC 4. Use the irq_desc to get what the interrupt needs • device info 2 Vector • interrupt controller info • IRQ action list info 5. Execute the interrupt service routine (ISR) 3 Irq_desc 4 Device Chip Domain Action 5 ISR 7 Copyright 2018 FUJITSU LIMITED

9.Why “APIC and Vector ” Can Work?  Do many initialization and setup works when Linux boots up  For the interrupt delivery • Initialize 8259A • Switch interrupt delivery mode • Initialize APIC Device CPU • Local APIC setup • I/O APIC setup  For IDT table, • Initialize the mapping of Vector and Handler Interrupt Normal Context Context  For each Interrupt, • Allocate an IRQ • Allocate an irq_desc Interrupt handler • Assign a vector 8 Copyright 2018 FUJITSU LIMITED

10.Outline  Basics of an interrupt Overhaul of interrupt What's next? APIC Initialization  Vector Allocation  Future work 9 Copyright 2018 FUJITSU LIMITED

11.Existing Problems  Interrupt in x86 is a conglomerate of ancient bits and pieces  Subject to 'modernization' and features over the years • Kdump • CPU Hotplug/System hibernation • Multi-queue devices  It looks like a penguin full of band-aids  Can work, but can’t see how it works easily. 10 Copyright 2018 FUJITSU LIMITED

12.Problems of APIC Initialization  Horrible interrupt mode setup  Setup the mode at random places  Run the kernel with the potentially wrong mode  Tangle the timer setup with interrupt initialization Timer PIC Mode Virtual Wire Mode Symmetric I/O Mode 11 Copyright 2018 FUJITSU LIMITED

13.Overhaul of APIC Initialization Kconfig  1. Unify the APIC and interrupt mode setup CONFIG_X86_64 CONFIG_X86_LOCAL_APIC  Construct a selector for the interrupt delivery mode CONFIG_x86_IO_APIC CONFIG_SMP CPU Capability PIC Mode boot_cpu_has(X86_FEATURE_APIC) MP table smp_found_config Selector Virtual Wire Mode ACPI table acpi_lapic acpi_ioapic nr_ioapic Symmetric I/O Mode Command line options disable_apic skip_ioapic_setup nolapic/noapic/ apic= See arch/x86/kernel/apic/apic.c apic_intr_mode_select() 12 Copyright 2018 FUJITSU LIMITED

14.Overhaul of APIC Initialization  1. Unify the APIC and interrupt mode setup  Provide a single function init_bsp_APIC( ) Finished at once native_smp_prepare_cpus( ) apic_intr_mode_init( ) smp_init( ) See arch/x86/kernel/apic/apic.c apic_intr_mode_init() 13 Copyright 2018 FUJITSU LIMITED

15.Overhaul of APIC Initialization  2. Disentangle the timer setup from the APIC initialization  Refactor the delay logic during APIC initialization process. • Either use TSC or a simple delay loop to make a rough delay estimate 400000000000/HZ TSC cycles mdelay(10) 40940000000000/HZ TSC cycles  Split local APIC timer setup from the APIC setup 14 Copyright 2018 FUJITSU LIMITED

16.Overhaul of APIC Initialization  3. Reorganize the interrupt initialization  Set up the final interrupt delivery mode as soon as possible. 1) Set up the legacy timer(PIT/HPET) x86_init.timers.timer_init( ) 2) Set up APIC/IOAPIC x86_init.irqs.intr_mode_init( ) 3) TSC calibration tsc_init( ) 4) Local APIC timer setup x86_init.timers.setup_percpu_clockev() 15 Copyright 2018 FUJITSU LIMITED

17.Overhaul of APIC Initialization  4. Some others  Refactor some common APIC function  Compatible with ACPI initialization  Bypass the hypervisor, Such as KVM and Xen  5. Can check which mode the interrupt is by ‘dmesg’: 16 Copyright 2018 FUJITSU LIMITED

18.Outline  Basics of an interrupt Overhaul of interrupt What's next?  APIC Initialization Vector Allocation  Future work 17 Copyright 2018 FUJITSU LIMITED

19.Problems of Vector Allocation  Horrible worst vector management mechanism  Abuse the interrupt allocation for different type interrupts  Serve all different use cases in one go  Based on nested loops to search  Cause vector space exhaustion  Allocate vectors at the wrong time and on the wrong place  Some dubious properties, causes high complexity  Multi CPU affinities for an IRQ  Priority level spreading Can work?  Lack of instrumentation  All of this is a black box which allows no insight into the actual vector usage 18 Copyright 2018 FUJITSU LIMITED

20. Overhaul of Vector Allocation  1. Classify the types of vectors  2. Refactor the vector allocation mechanism  3. Switch to a reservation scheme  4. Some Others 3. Reservation Scheme An Vector ID Initilization Request IRQ Any functions which request an vector activation 2. Vector Allocator 1. Vector Classifier IRQ enabled IRQ startup 19 Copyright 2018 FUJITSU LIMITED

21.Overhaul of Vector Allocation  1. Classify the types of vectors  Each CPU has 256 vectors, But some are fixed  1. System Vector Classifier * Vectors 0 ... 31 * Vector 128 * Vectors INVALIDATE_TLB_VECTOR_START ... 255  2. Legacy Vector * Vectors 0x30 ... 0x3f  Others are allocated dynamically for normal and managed interrupts. 20 Copyright 2018 FUJITSU LIMITED

22.Overhaul of Vector Allocation  1. Classify the types of vectors  For external interrupts  Depend on Interrupt Affinity(the set of CPUs that can handle this interrupt)  3. Normal Vector  4. Managed Vector Normal Interrupt Managed Interrupt - Affinity may be NULL - Affinity must have been setup At setup time - A subset of the online CPUs - the possible CPUs may be included User space - Affinity can be modified - Affinity is fixed - IRQ can be moved to any - IRQ can move only in the affinity. When migration online CPUs - But, can be shutdown and restarted. - Affinity can be even reset - Affinity can’t be reset 21 Copyright 2018 FUJITSU LIMITED

23.Overhaul of Vector Allocation  2. Refactor the vector allocation mechanism  Create a new bitmap matrix allocator——IRQ Matrix Allocator Global Global Counters system available allocated system bitmap 00000000000000000000000000000 CPU 0 Percpu Percpu Local Counters available allocated managed CPU 1 Percpu allocated bitmap ………… 00000000000000000000000 managed bitmap CPU n Percpu 00000000000000000000000 22 Copyright 2018 FUJITSU LIMITED

24.Overhaul of Vector Allocation  2. Refactor the vector allocation mechanism  Use the matrix for System vector Global Global Counters system available allocated system bitmap 11111111000000000000000011111 CPU 0 Percpu Percpu Local Counters available allocated managed CPU 1 Percpu allocated bitmap ………… 00000000000000000000000 managed bitmap CPU n Percpu 00000000000000000000000 23 Copyright 2018 FUJITSU LIMITED

25.Overhaul of Vector Allocation  2. Refactor the vector allocation mechanism  Use the matrix for Legacy vector Global Global Counters system available allocated system bitmap 11111111000000000000000011111 CPU 0 Percpu Percpu Local Counters available allocated managed CPU 1 Percpu allocated bitmap ………… 00000011111100000000000 managed bitmap CPU n Percpu 00000000000000000000000 24 Copyright 2018 FUJITSU LIMITED

26.Overhaul of Vector Allocation  2. Refactor the vector allocation mechanism  Use the matrix for Normal vector Global Global Counters system available allocated system bitmap 11111111000000000000000011111 CPU 0 Percpu Percpu Local Counters available allocated managed CPU 1 Percpu Step 1 allocated bitmap ………… 00000011111110000000000 Step 2 managed bitmap CPU n Percpu 00000000000000111000000 25 Copyright 2018 FUJITSU LIMITED

27.Overhaul of Vector Allocation  2. Refactor the vector allocation mechanism  Use the matrix for Managed vector Global Global Counters system available allocated system bitmap 11111111000000000000000011111 CPU 0 Percpu Percpu Local Counters available allocated managed CPU 1 Percpu allocated bitmap ………… 00000011111111000000000 Step 1 Step 3 managed bitmap CPU n Percpu 00000000001100001111000 Step 2 26 Copyright 2018 FUJITSU LIMITED

28.Overhaul of Vector Allocation  3. Switch to reservation scheme  Reserve a new system vector , just in case Reservation 3.1 When the interrupt is allocated and initialized: Now Previously wasteful 1. Update the reservation request counter Assign a real vector for each interrupts 2. Assign the reserved vector for each interrupts 27 Copyright 2018 FUJITSU LIMITED

29.Overhaul of Vector Allocation  3. Switch to reservation scheme Vector Space  Separate activation and startup  Assign the real vector Saving 3.2 When the interrupt is requested: Activate Assign a real vector Startup for normal interrupts Activate Fail ? Can fail Startup Assign a real vector for managed interrupts Continue… Continue… 28 Copyright 2018 FUJITSU LIMITED