PingCAP-Infra-Meetup-97-longheng-An-introduction-to-failpoint

本次分享龙恒老师首先介绍了 Failpoint 的使用场景,以及 github.com/etcdi-io/gofail 的优缺点,然后对 Failpoint 设计原则、实现细节及实现过程中的取舍作了介绍,最后演示了各种 Marker 函数的用法,以及在并行测试中如何使用 context 控制 failpoints Enable/Disable,从而达到隔离不同并行任务的目的。
展开查看详情

1.An Introduction to Failpoint Design Presented by Heng Long

2.Agenda ● Why we need failpoint ? ● Why to reinvent the wheel ? ● Implementation details ● Marker functions demo

3.Why we need failpoint ● Errors can happen anywhere, any time ● Some errors are hard to reproduce ○ Hardware ■ disk error ■ network error ■ CPU ■ clock ○ Software ■ file system ■ network & protocol ■ library ● We need to simulate everything to cover corner cases

4.Part I - Why to reinvent the wheel

5.About gofail ● An implementation of FreeBSD failpoints for Golang. ○ https://www.freebsd.org/cgi/man.cgi?query=fail ● Define failpoints by comments ○ gofail enable converts comments to code ○ gofail disable converts code to comments

6.Define a failpoint in gofail // gofail: var FailIfImportedChunk int // if merger, ok := scp.merger.(*ChunkCheckpointMerger); ok && merger.Checksum.SumKVS() >= uint64(FailIfImportedChunk) { // rc.checkpointsWg.Done() // rc.checkpointsWg.Wait() // panic("forcing failure due to FailIfImportedChunk") // } // goto RETURN1 // gofail: RETURN1: // gofail: var FailIfStatusBecomes int // if merger, ok := scp.merger.(*StatusCheckpointMerger); ok && merger.EngineID >= 0 && int(merger.Status) == FailIfStatusBecomes { // rc.checkpointsWg.Done() // rc.checkpointsWg.Wait() // panic("forcing failure due to FailIfStatusBecomes") // } // goto RETURN2 // gofail: RETURN2: https://github.com/pingcap/tidb-lightning/blob/2792368c60b548c222b24c030b01d317ed6c5891/lightning/restor e/restore.go#L361

7.The generated failpoint code if vFailIfImportedChunk, __fpErr := __fp_FailIfImportedChunk.Acquire(); __fpErr == nil { defer __fp_FailIfImportedChunk.Release(); FailIfImportedChunk, __fpTypeOK := vFailIfImportedChunk.(int); if !__fpTypeOK { goto __badTypeFailIfImportedChunk} if merger, ok := scp.merger.(*ChunkCheckpointMerger); ok && merger.Checksum.SumKVS() >= uint64(FailIfImportedChunk) { rc.checkpointsWg.Done() rc.checkpointsWg.Wait() panic("forcing failure due to FailIfImportedChunk") } goto RETURN1; __badTypeFailIfImportedChunk: __fp_FailIfImportedChunk.BadType(vFailIfImportedChunk, "int"); }; /* gofail-label */ RETURN1: if vFailIfStatusBecomes, __fpErr := __fp_FailIfStatusBecomes.Acquire(); __fpErr == nil { defer __fp_FailIfStatusBecomes.Release(); FailIfStatusBecomes, __fpTypeOK := vFailIfStatusBecomes.(int); if !__fpTypeOK { goto __badTypeFailIfStatusBecomes} if merger, ok := scp.merger.(*StatusCheckpointMerger); ok && merger.EngineID >= 0 && int(merger.Status) == FailIfStatusBecomes { rc.checkpointsWg.Done() rc.checkpointsWg.Wait() panic("forcing failure due to FailIfStatusBecomes") } goto RETURN2; __badTypeFailIfStatusBecomes: __fp_FailIfStatusBecomes.BadType(vFailIfStatusBecomes, "int"); }; /* gofail-label */ RETURN2:

8.The ideal form of failpoint (Used in TiKV) fail_point!("transport_on_send_store", |sid| if let Some(sid) = sid { let sid: u64 = sid.parse().unwrap(); if sid == store_id { self.raft_client.wl().addrs.remove(&store_id); } }) ● What difficulties have we encountered? ○ No macro support in Golang ○ No compiler plugin support in Golang ○ It’s not elegant to use go build tags (go build --tags="enable-failpoint")

9.Design principles ● Define failpoint in valid Golang code, not comments or anything else ● Failpoint should be zero cost ○ Cannot take effect on regular logic ○ Cannot cause regular code performance regression ○ Failpoint code cannot appear in the final binary ● It must be easy to write/read and checked by a compiler ● The generated code should be easy to read ● Keep the line number the same (for easy debug) ● Support parallel tests with context.Context

10.Part II - Implementation details

11.Implementaion in gofail ● gofail enable ○ Iterate Golang source code line by line and find a line starting with "// gofail:" ○ Generate failpoint according to Golang CommentGroup ○ https://github.com/pingcap/gofail/blob/6a951c1e42c3ad095ef5f5b0f147682a01e49a 2d/code/rewrite.go#L26 ● gofail disable ○ Revese process ○ https://github.com/pingcap/gofail/blob/6a951c1e42c3ad095ef5f5b0f147682a01e49a 2d/code/rewrite.go#L75

12.Implementaion in new failpoint ● Define a group of marker functions var outVar = "declare in outer scope" failpoint.Inject("failpoint-name", func(val failpoint.Value) { ● Parse imports and prune a source file fmt.Println("unit-test", val, outerVar) }) which does not import failpoint ● Traverse AST to find marker function AST Rewrite calls ● Marker function call will be rewritten var outVar = "declare in outer scope" with an IF statement, which calls if ok, val := failpoint.Eval(_curpkg_("failpoint-name")); ok { fmt.Println("unit-test", val, outerVar) failpoint.Eval to determine whether a } failpoint is active and executes failpoint code if the failpoint is enabled

13.

14.What is marker function? ● It is just an empty function ○ To hint rewriter to rewrite with equality statement ○ Receive some parameters as the rewrite rule ○ It will be inline in compile time and emit nothing to binary (zero cost) ○ The closure can access external variables in valid syntax which called capture, and the converted IF statement is still legal because of all of captured variables become an outer scope variable access. That’s awesome ● It is easy to write/read ● Introduce a compiler check for failpoints which cannot compile in regular mode if failpoint code is invalid

15.Marker function call will be eliminated 143856 TEXT main.main(SB) /Users/lonng/devs/tmp/inline/main.go 1 package main 143857 main.go:8 0x1093240 65488b0c2530000000 MOVQ GS:0x30, CX 2 143858 main.go:8 0x1093249 483b6110 CMPQ 0x10(CX), SP 3 import ( 143859 main.go:8 0x109324d 7672 JBE 0x10932c1 4 "fmt" 143860 main.go:8 0x109324f 4883ec58 SUBQ $0x58, SP 143861 main.go:8 0x1093253 48896c2450 MOVQ BP, 0x50(SP) 5 "math/rand" 143862 main.go:8 0x1093258 488d6c2450 LEAQ 0x50(SP), BP 6 ) 143863 main.go:9 0x109325d 90 NOPL 7 143864 main.go:9 0x109325e 0f57c0 XORPS X0, X0 8 func main() { 143865 main.go:23 0x1093261 0f11442440 MOVUPS X0, 0x40(SP) 9 marker("hello-test", func() { 143866 main.go:23 0x1093266 488d05330e0100 LEAQ type.*+68960(SB), AX 10 var x = rand.Intn(100) 143867 main.go:23 0x109326d 4889442440 MOVQ AX, 0x40(SP) 143868 main.go:23 0x1093272 488d0527b90400 LEAQ main.statictmp_0(SB), AX 11 var y = rand.Intn(100) 143869 main.go:23 0x1093279 4889442448 MOVQ AX, 0x48(SP) 12 if x == y { 143870 main.go:23 0x109327e 90 NOPL 13 fmt.Println("lucky very much") 143871 print.go:275 0x109327f 488b05da060e00 MOVQ os.Stdout(SB), AX 14 } 143872 print.go:275 0x1093286 488d0d53d10400 LEAQ go.itab.*os.File,io.Writer(SB), CX 15 if x < y { 143873 print.go:275 0x109328d 48890c24 MOVQ CX, 0(SP) 16 fmt.Println("x less than y") 143874 print.go:275 0x1093291 4889442408 MOVQ AX, 0x8(SP) 143875 print.go:275 0x1093296 488d442440 LEAQ 0x40(SP), AX 17 } else { 143876 print.go:275 0x109329b 4889442410 MOVQ AX, 0x10(SP) 18 fmt.Println("x great or equal than y") 143877 print.go:275 0x10932a0 48c744241801000000 MOVQ $0x1, 0x18(SP) 19 } 143878 print.go:275 0x10932a9 48c744242001000000 MOVQ $0x1, 0x20(SP) 20 fmt.Println("hell test") 143879 print.go:275 0x10932b2 e8a98dffff CALL fmt.Fprintln(SB) 21 fmt.Println("hell test") 143880 print.go:275 0x10932b7 488b6c2450 MOVQ 0x50(SP), BP 22 }) 143881 print.go:275 0x10932bc 4883c458 ADDQ $0x58, SP 143882 print.go:275 0x10932c0 c3 RET 23 fmt.Println("vim-go") 143883 main.go:8 0x10932c1 e87abafbff CALL runtime.morestack_noctxt(SB) 24 } 143884 main.go:8 0x10932c6 e975ffffff JMP main.main(SB) 25 143885 :-1 0x10932cb cc INT $0x3 26 func marker(fpname string, block func()) { 143886 :-1 0x10932cc cc INT $0x3 27 }

16.Marker functions list ● func Inject(fpname string, fpblock func(val Value)) {} ● func InjectContext(fpname string, ctx context.Context, fpblock func(val Value)) {} ● func Break(label ...string) {} ● func Goto(label string) {} ● func Continue(label ...string) {} ● func Fallthrough() {} ● func Return(results ...interface{}) {} ● func Label(label string) {}

17.Part III - Marker funtions demo

18.failpoint.Inject failpoint.Inject("failpoint-name", func(val failpoint.Value) { fmt.Println("unit-test", val) ● Can be used anywhere }) ● failpoint-name was used to trigger the if ok, val := failpoint.Eval(_curpkg_("failpoint-name")); ok { fmt.Println("unit-test", val) failpoint } ● failpoint-closure will be expanded as failpoint.Inject("failpoint-name", func() { the body of the IF statement fmt.Println("unit-test") }) ● failpoint.Value can be ignored failpoint.Inject("failpoint-name", func(_ failpoint.Value) { fmt.Println("unit-test") }) if ok, _ := failpoint.Eval(_curpkg_("failpoint-name")); ok { fmt.Println("unit-test") }

19.failpoint.InjectContext ● Can be used anywhere failpoint.InjectContext("failpoint-name", ctx, func(val failpoint.Value) { fmt.Println("unit-test", val) }) ● Context was used to control whether a failpoint is enabled in if ok, val := failpoint.EvalContext(ctx, _curpkg_("failpoint-name")); ok { fmt.Println("unit-test", val) parallel tests } ● context.Context can be ignored ● Use outer scope variable directly var outerVar = "declare in outer scope" failpoint.InjectContext("failpoint-name", nil, func(val failpoint.Value) { without any extra cost fmt.Println("unit-test", val, outerVar) }) var outerVar = "declare in outer scope" if ok, val := failpoint.EvalContext(nil, _curpkg("failpoint-name")); ok { fmt.Println("unit-test", val, outerVar) }

20.failpoint.Inject You can control a failpoint by failpoint.WithHook func (s *dmlSuite) TestCRUDParallel() { sctx := failpoint.WithHook(context.Backgroud(), func(ctx context.Context, fpname string) bool { return ctx.Value(fpname) != nil // Determine by ctx key }) insertFailpoints = map[string]struct{} { "insert-record-fp": {}, "insert-index-fp": {}, "on-duplicate-fp": {}, } ictx := failpoint.WithHook(context.Backgroud(), func(ctx context.Context, fpname string) bool { _, found := insertFailpoints[fpname] // Only enables some faipoints return found }) deleteFailpoints = map[string]struct{} { "tikv-is-busy-fp": {}, "fetch-tso-timeout": {}, } dctx := failpoint.WithHook(context.Backgroud(), func(ctx context.Context, fpname string) bool { _, found := deleteFailpoints[fpname] // Only disables failpoints return !found }) // ... other dml parallel test cases s.RunParallel(buildSelectTests(sctx)) s.RunParallel(buildInsertTests(ictx)) s.RunParallel(buildDeleteTests(dctx)) }

21.failpoint.Break ● Can be used in a loop context for i := 0; i < 100; i++ { failpoint.Inject("control-flow", func(val failpoint.Value) { ● Break the innerest loop if i%10 == val.(int) { failpoint.Break() } }) } for i := 0; i < 100; i++ { if ok, val := failpoint.Eval(_curpkg_("control-flow")); ok { if i%10 == val.(int) { break } } }

22.failpoint.Break ● Can be used in a loop context ● Break to a label ● Label is also used by non-failpoint code label: label: for i := 0; i < 100; i++ { for i := 0; i < 100; i++ { if i % 5 == 0 { if i % 5 == 0 { continue label continue label } } failpoint.Inject("control-flow", func(val failpoint.Value) { if ok, _ := failpoint.Eval(_curpkg_("control-flow")); ok { if i%10 == val.(int) { if i%10 == val.(int) { failpoint.Break("label") break label } } }) } } }

23.failpoint.Label failpoint.Label("label") ● Define a label for i := 0; i < 100; i++ { failpoint.Inject("control-flow", func(val failpoint.Value) { ● Can be used anywhere if i%10 == val.(int) { failpoint.Break("label") } }) } label: for i := 0; i < 100; i++ { if ok, _ := failpoint.Eval(_curpkg("control-flow")); ok { if i%10 == val.(int) { break label } } }

24.failpoint.Continue ● Can be used in a loop context for i := 0; i < 100; i++ { failpoint.Inject("control-flow", func(val failpoint.Value) { ● Skip an iteration if i%10 == val.(int) { failpoint.Continue() } }) } for i := 0; i < 100; i++ { if ok, _ := failpoint.Eval(_curpkg("control-flow")); ok { if i%10 == val.(int) { continue } } }

25.failpoint.Continue ● Can be used in a loop context ● Break to a label label: label: for i := 0; i < 100; i++ { for i := 0; i < 100; i++ { if i % 5 == 0 { if i % 5 == 0 { break label break label } } failpoint.Inject("control-flow", func(val failpoint.Value) { if ok, _ := failpoint.Eval(_curpkg("control-flow")); ok { if i%10 == val.(int) { if i%10 == val.(int) { failpoint.Continue("label") continue label } } }) } } }

26.failpoint.Goto ● Can be used anywhere ● Go to a label label: label: for i := 0; i < 100; i++ { for i := 0; i < 100; i++ { if i % 50 == 0 { if i % 50 == 0 { break label break label } } failpoint.Inject("control-flow", func(val failpoint.Value) { if ok, _ := failpoint.Eval(_curpkg("control-flow")); ok { if i%10 == val.(int) { if i%10 == val.(int) { failpoint.Goto("label") goto label } } }) } } }

27.failpoint.Fallthrough ● Can be used in a SWITCH CASE context switch x := rand.Intn(10); { switch x := rand.Intn(10); { case x < 2: case x < 2: failpoint.Fallthrough() fallthrough case x < 5: case x < 5: fmt.Println("too small") fmt.Println("too small") default: default: fmt.Println("hello fallthrough") fmt.Println("hello fallthrough") } }

28.failpoint.Return func test() (int, string) { func test() (int, string) { failpoint.Inject("failpoint-name", func(val failpoint.Value) (int, string) { if ok, val := failpoint.Eval(_curpkg_("failpoint-name"); ok { return val.(int), "demo string" return val.(int), "demo string" }) }) } } func test() (int, string) { func test() (int, string) { failpoint.Inject("failpoint-name", func(val failpoint.Value) (int, string) { if ok, val := failpoint.Eval(_curpkg_("failpoint-name"); ok { × if val.(int) < 100 { if val.(int) < 100 { return rand.Intn(100), "demo string" return rand.Intn(100), "demo string" } } return val.(int), "unexpected early return" return val.(int), "unexpected early return" }) }) // remain logic // remain logic } } func test() (int, string) { func test() (int, string) { √ failpoint.Inject("failpoint-name", func(val failpoint.Value) { if ok, val := failpoint.Eval(_curpkg_("failpoint-name"); ok { if val.(int) < 100 { if val.(int) < 100 { failpoint.Return(rand.Intn(100), "demo string") return rand.Intn(100), "demo string" } } }) }) // remain logic // remain logic } }

29.All markers failpoint.Label("outer") outer: for i := 0; i < 100; i++ { for i := 0; i < 100; i++ { inner: inner: for j := 0; j < 1000; j++ { for j := 0; j < 1000; j++ { switch rand.Intn(j) + i { switch rand.Intn(j) + i { case j / 5: case j / 5: failpoint.Break() break case j / 7: case j / 7: failpoint.Continue("outer") continue outer case j / 9: case j / 9: failpoint.Fallthrough() fallthrough case j / 10: case j / 10: failpoint.Goto("outer") goto outer default: default: failpoint.Inject("failpoint-name", func(val failpoint.Value) { if ok, val := failpoint.Eval(_curpkg("failpoint-name")); ok { fmt.Println("unit-test", val.(int)) fmt.Println("unit-test", val.(int)) if val == j/11 { if val == j/11 { failpoint.Break("inner") break inner } else { } else { failpoint.Goto("outer") goto outer } } }) } } } } } } }

TiDB 是一款定位于在线事务处理/在线分析处理( HTAP: Hybrid Transactional/Analytical Processing)的融合型数据库产品,实现了一键水平伸缩,强一致性的多副本数据安全,分布式事务,实时 OLAP 等重要特性。