Advanced Disk Activity Tracking Tool-iotrace

This topic presents the necessity to track disk IO activities from the perspective of processes, reveals the design of the advanced tracking tool - "iotrace", which bases on trace events. The audience will benefit to acknowledge the necessity to track disk IO activities, hands-on experience to use the tool.
展开查看详情

1. ioTrace Another Disk Activity Tracing Tool Ahao Mu (ahao.mah@alibaba-inc.com) June 26, 2018

2.Background •  Requirement proposed by Alibaba’s business line: Process centralized disk activities. •  Currently implemented tools can’t meet the requirement.

3.Pain •  The PID/TID are unknown in scenario of disk bandwidth is overhauled. •  It brings difficulties to narrow down the problematic processes/threads.

4.Disk IO Toolset •  iotop –  Written in Python language, read from /proc/<pid>/io and /proc/diskstats. –  Missed DEVICE dimension. •  iostat –  Written in C language, read from /proc/diskstats, See Documentation/iostats.txt. –  Regardless of processes. •  blktrace –  Written in C language, massive and bogus output. –  Tremendous performance overhead. As above all are not the ideal way in our production environment.

5.Goal of ioTrace •  Aware of PID/TID and DEVICE dimensions. •  Debugging and monitoring disk’s activities. •  Light, agile and easy for daemonizing in production environment.

6.IO Stack

7.Techniques of ioTrace •  Work on top of block generic layer. •  Based on kernel blktrace API. •  Built with kernel tracepoints.

8.The API kernel provided The staTsTcs that ioTrace collects The stages of IO requests are and manipulates: represented by: struct blk_io_trace { enum { __u32 magic; /* MAGIC << 8 | version */ BLK_TC_READ = 1 << 0, /* reads */ __u32 sequence; /* event number */ BLK_TC_WRITE = 1 << 1, /* writes */ __u64 time; /* in nanoseconds */ BLK_TC_FLUSH = 1 << 2, /* flush */ __u64 sector; /* disk offset */ BLK_TC_SYNC = 1 << 3, /* sync */ __u32 bytes; /* transfer length */ BLK_TC_QUEUE = 1 << 4, /* queueing/merging */ __u32 ac(on; /* what happened */ BLK_TC_REQUEUE = 1 << 5, /* requeueing */ __u32 pid; /* who did it */ BLK_TC_ISSUE = 1 << 6, /* issue */ __u32 device; /* device identifier (dev_t) */ BLK_TC_COMPLETE = 1 << 7, /* completions */ __u32 cpu; /* on what cpu did it happen */ BLK_TC_FS = 1 << 8, /* fs requests */ __u16 error; /* completion error */ BLK_TC_PC = 1 << 9, /* pc requests */ __u16 pdu_len; /* length of data aZer this trace */ BLK_TC_NOTIFY = 1 << 10, /* special message */ }; BLK_TC_AHEAD = 1 << 11, /* readahead */ BLK_TC_META = 1 << 12, /* metadata */ BLK_TC_DISCARD = 1 << 13, /* discard requests */ BLK_TC_DRV_DATA = 1 << 14, /* binary driver data */ BLK_TC_FUA = 1 << 15, /* fua requests */ BLK_TC_END = 1 << 15, /* we've run out of bits! */ };

9.The design of iotrace Key objects and components: 1. CPU List 2. Disk group 3. Epoll 4. Collect thread 5. Analyzer thread 6. Hash table record 7. Ranking logic

10.Functions of ioTrace •  Support TID, PID and DEVICE dimentions. •  Collect read_iops, write_iops, read_bytes, write_bytes, total_counts. •  Support prompt output to console and lagged json output to remote database. •  Support deamonizing and crond’ing mode with systemd. •  Support specifying target DEVICE name for monitoring.

11.Usage Support mulTple arguments: target device, prompt output mode, daemoniziTon or crond running mode, ranking output. #iotrace Usage: iotrace [ -d <dev> | --dev=<dev> ] [ -m | --daemon ] [ -c | --cron ] [ -n <number> | --top_candidates=<pid top max>] [ -f <filename> | --file=<configure file> ] [ -v <version> | --version ] [ -l <live> | --live ] [ -i <interval> | --interval=<seconds> ] [ -p <thread> | --thread=<count> ] -d Used to specify device -m Used to specify daemonize running or not -c Used to specify cron running or not -n Used to specify top candidates, defaults is 3 -l Used to specify show data live or not -p Used to specify mulTple thread max count -i Used to specify interval(second) -f Path to iotrace configure file, defaults to /etc/iotrace/iotrace.conf e.g: #./iotrace -d all -li1 #./iotrace -d /dev/sda,/dev/sdc -li1 #./iotrace -c

12.Data Accuracy ioTrace iostat Timestamp Metric ioTrace iostat Offset 20180529 r_bytes 2890KB 2737KB +5.5% 13:11:03 20180529 r_bytes 13542KB 14052KB -3.6% 13:11:04

13.Case Output from ioTrace: Output from SAR: disk uTl 100% Consequence: Kworker is the obstacle

14.Case Output from ioTrace: Output from SAR: Consequence: PID 125872 is suspecious

15.Thanks & Questions

16.