Troubleshooting Crashing Issues

这个网络研讨会是为每个MySQL用户准备的!在本文中,我不会重点讨论如何分析核心文件、读取源代码和设置断点。相反,我将关注任何人都可以使用的技术,即使是新手。
许多教程,包括我自己的,基于RoelvandePaar的视频编写的,建议如何创建和分析崩溃时创建的核心文件。虽然这是最强大的故障排除技术之一,但并不是唯一的技术。对于任何级别的MySQL用户,都有更简单和可访问的方法。
您将了解MySQL服务器崩溃的原因、如何识别崩溃的来源以及如何在将来防止崩溃。
“与MySQL服务器的连接断开”和“服务器已断开”是应用程序在MySQL服务器崩溃时看到的消息。
在本次网络研讨会中,我将讨论:
-先检查什么
-如何理解错误消息
-车祸的典型原因
-如何在不接触mysql源代码的情况下找到崩溃的原因
-如何读取回溯
我还将介绍这些技术的局限性,例如它们没有揭示什么,以及它们中最复杂的部分是什么。

展开查看详情

1.Troubleshooting Crashing Issues January, 25, 2018 Sveta Smirnova

2.Table of Contents •Is this Really Crash? •What to Check? •Typical Reasons for Crashes •How to Get and Read Backtraces •What to Do Next 2

3.Is this Really Crash?

4.Two main symptoms • 2013: Lost connection to MySQL server... • Got an error reading communication packets 4

5.They can happen, because • Connection error • Timeout • Server hangs • Normal shutdown • Crash 5

6.What to Check?

7.Was Server Restarted? • Uptime mysql> show global status like ’Uptime’; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Uptime | 8690 | +---------------+-------+ 1 row in set (0.00 sec) 7

8.Was Server Restarted? • Uptime mysql> show global status like ’Uptime’; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Uptime | 8690 | +---------------+-------+ 1 row in set (0.00 sec) • How much time passed after planned restart? 7

9.Was Server Restarted? • Uptime mysql> show global status like ’Uptime’; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Uptime | 8690 | +---------------+-------+ 1 row in set (0.00 sec) • How much time passed after planned restart? • Not reliable 7

10.Was Server Restarted? • Uptime • Error log file • Normal shutdown 2018-01-22T17:40:29.557089Z 0 [Note] Giving 1 client threads a chance to die gracefully ... 2018-01-22T17:40:31.572000Z 0 [Note] InnoDB: Starting shutdown... ... 2018-01-22T17:40:33.313018Z 0 [Note] /home/sveta.smirnova/5.7.14/bin/mysqld: Shutdown complete 7

11.Was Server Restarted? • Uptime • Error log file • Crash 2018-01-18T18:47:46.749649Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 2018-01-18 13:47:46 0x7eff727ff700 InnoDB: Assertion failure in thread 139635602749184 in file ut0ut.cc line 917 InnoDB: We intentionally generate a memory trap. ... Version: ’5.7.14-8’ socket: ’/tmp/mysql_sandbox5714.sock’ port: 5714 Percona Server (GPL), Release 8, Revision 1f84ccd ... 7

12.Communication Errors and Timeouts • Aborted % mysql> show status like ’Aborted%’; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | Aborted_clients | 541 | | Aborted_connects | 683 | +------------------+-------+ 2 rows in set (0.00 sec) 8

13.Communication Errors and Timeouts • Aborted % • log warnings > 1 2018-01-22T17:51:58.542771Z 3 [Note] Aborted connection 3 to db: ’unconnected’ user: ’root’ host: ’localhost’ (Got an error reading communication packets) 8

14.Communication Errors and Timeouts • Aborted % • log warnings > 1 • Processlist mysql> show processlist; +----+------+-----------------+------+---------+------+----------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+----------+------------------+ | 4 | root | localhost:38620 | NULL | Query | 0 | starting | show processlist | +----+------+-----------------+------+---------+------+----------+------------------+ 1 row in set (0.00 sec) 8

15.Communication Errors and Timeouts • Aborted % • log warnings > 1 • Processlist mysql> show processlist; +----+------+-----------------+------+---------+------+----------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+----------+------------------+ | 4 | root | localhost:38620 | NULL | Query | 0 | starting | show processlist | +----+------+-----------------+------+---------+------+----------+------------------+ 1 row in set (0.00 sec) • Watch if connections still open 8

16.Server Hangs • Processlist ... | 732 | msa ... | 8645 | statistics | SELECT c FROM sbtest1 WHERE id=5028787 | | 733 | msa... | 8645 | statistics | SELECT c FROM sbtest1 WHERE id=3023457 | | 734 | msa... | 8645 | statistics | SELECT c FROM sbtest1 WHERE id=4980329 | | 735 | msa... | 8645 | statistics | SELECT c FROM sbtest1 WHERE id=4995094 | | 736 | msa... | 8645 | statistics | SELECT c FROM sbtest1 WHERE id=5263263 | | 739 | msa... | 0 | starting | show processlist | +-----+----------+------+------------------------+----------------------------------------+ 735 rows in set (0.00 sec) 9

17.Server Hangs • Processlist • Error log 2018-01-19T20:38:43.381127Z 0 [Warning] InnoDB: A long semaphore wait: --Thread 139970010412800 has waited at ibuf0ibuf.cc line 3454 for 321.00 seconds the semaphore: S-lock on RW-latch at 0x7f4dde2ea310 created in file buf0buf.cc line 1453 a writer (thread id 139965530261248) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: fffffffff0000000 Last time read locked in file ibuf0ibuf.cc line 3454 Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/ debian-wheezy-x64/percona-server-5.7.14-8/storage/innobase/btr/btr0btr.cc line 177 2018-01-19T20:38:43.381143Z 0 [Warning] InnoDB: A long semaphore wait: --Thread 139965135804160 has waited at buf0buf.cc line 4196 for 321.00 seconds the semaphore: S-lock on RW-latch at 0x7f4f257d33c0 created in file hash0hash.cc line 353 a writer (thread id 139965345621760) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file buf0buf.cc line 4196 9 Last time write locked in file ...

18.Crash • Error Log Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fdd70bf1df0 thread_stack 0x800000 /home/sveta.smirnova/5.7.14/bin/mysqld(my_print_stacktrace+0x2e)[0xfeb29e] /home/sveta.smirnova/5.7.14/bin/mysqld(handle_fatal_signal+0x4a1)[0x942931] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fe00f5ee330] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fe00d775c37] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fe00d779028] /home/sveta.smirnova/5.7.14/bin/mysqld[0x9130f3] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z26btr_pcur_move_to_next_pageP10btr_pcur_tP5mtr_t+0x1ff)[0 /home/sveta.smirnova/5.7.14/bin/mysqld[0x1187413] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z15row_search_mvccPh15page_cur_mode_tP14row_prebuilt_tmm+0 /home/sveta.smirnova/5.7.14/bin/mysqld(_ZN11ha_innobase13general_fetchEPhjj+0x1d2)[0x10933d2] /home/sveta.smirnova/5.7.14/bin/mysqld(_ZN7handler13ha_index_nextEPh+0xe1)[0x99be61] 10

19.Crash • Error Log /home/sveta.smirnova/5.7.14/bin/mysqld(_ZN7handler15read_range_nextEv+0x29)[0x9a0349] /home/sveta.smirnova/5.7.14/bin/mysqld(_ZN7handler21multi_range_read_nextEPPc+0xd2)[0x997762] /home/sveta.smirnova/5.7.14/bin/mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x52)[0xf0cee2] /home/sveta.smirnova/5.7.14/bin/mysqld[0xd6b3f6] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z10sub_selectP4JOINP7QEP_TABb+0x276)[0xdd0c06] /home/sveta.smirnova/5.7.14/bin/mysqld(_ZN4JOIN4execEv+0x26f)[0xdcac4f] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z12handle_queryP3THDP3LEXP12Query_resultyy+0x18d)[0xe2b75d /home/sveta.smirnova/5.7.14/bin/mysqld[0x909afa] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x31a4)[0xdf62d4] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x5d5)[0xdf8a25] /home/sveta.smirnova/5.7.14/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0 /home/sveta.smirnova/5.7.14/bin/mysqld(_Z10do_commandP3THD+0x1cd)[0xdfa40d] /home/sveta.smirnova/5.7.14/bin/mysqld(handle_connection+0x2a4)[0xeaf8a4] /home/sveta.smirnova/5.7.14/bin/mysqld(pfs_spawn_thread+0x1b4)[0x101a3a4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7fe00f5e6184] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe00d83cffd] 10

20.Crash • Error Log Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fde2ba52030): SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 5008118 AND 5008118+99 ORDER BY c Connection ID (thread ID): 234 10

21.Crash • Error Log • General query log $ tail var/mysqld.1/mysqld.log 2018-01-22T21:23:21.493490Z 4 Query insert into big1 select * from a 2018-01-22T21:23:47.847059Z 4 Query insert into big2 select * from a 2018-01-22T21:24:22.695518Z 4 Query set global log_output=’file’ 2018-01-22T21:24:27.741938Z 4 Query set global general_log=0 /home/sveta/build/ps-5.7/bin/mysqld, Version: 5.7.20-19-debug-log. started with: Tcp port: 13001 Unix socket: /home/sveta/build/ps-5.7/mysql-test/var/tmp/mysqld.1.sock Time Id Command Argument 2018-01-22T21:24:38.289933Z 4 Query set session optimizer_switch=’duplicateweedout=off, materialization=on’ 2018-01-22T21:24:49.757008Z 4 Query select * from big1 where id in (select id from big2 ignore index (primary)) limit 10 10

22.Crash • Error Log • General query log • Audit log $ tail -n 14 var/mysqld.1/data/audit.log <AUDIT_RECORD NAME="Query" RECORD="7727_2018-01-22T21:33:54" TIMESTAMP="2018-01-22T21:34:26 UTC" ... SQLTEXT="set session optimizer_switch=’duplicateweedout=off,materialization=on’" USER="root[root] @ localhost [127.0.0.1]" HOST="localhost" OS_USER="" IP="127.0.0.1" DB="test" 10 />

23.Crash • Error Log • General query log • Audit log • May not include crashing statement! 10

24.Crash • Error Log • General query log • Audit log • Your application logs 10

25.Crash • Error Log • General query log • Audit log • Your application logs • System logs • Redirect from error and audit logs 10

26.Typical Reasons for Crashes

27.Major Groups • Data corruption • Hardware access failures • Hardware broken • Hardware not available No disk space Out of memory OS killed mysqld • Software bug 12

28.Data Corruption • Always reported in the error log file 13

29.Data Corruption • Always reported in the error log file InnoDB: but the tablespace does not exist or is just being dropped. 2017-03-27 20:04:36 7fd1c4d3e700 InnoDB: Error: trying to access tablespace 1731 page no. 3, InnoDB: but the tablespace does not exist or is just being dropped. InnoDB: Error: Unable to read tablespace 1731 page no 3 into the buffer pool after 100 attempts InnoDB: The most probable cause of this error may be that the table has been corrupted. InnoDB: You can try to fix this problem by using innodb_force_recovery. InnoDB: Please see reference manual for more details. InnoDB: Aborting... 2017-03-27 20:04:36 7fd1c4d3e700 InnoDB: Assertion failure in thread 140538927113984 in file buf0buf.cc line 2740 13