故障现象:
内网的一台老服务器故障,表现为业务中断,重启无效。操作系统为centos6,进系统发现系统已崩溃,重启后一分钟内又崩溃
处理办法:
第一步捕获日志
外接显示器,查看故障现象重启后一分钟内又崩溃,输出错误日志
重启后利用间隙时间,快速ssh登录,进一步查看内核消息
dmesg | grep -i "panic\|oops\|error"
ata5: SError: { PHYRdyChg 10B8B DevExch }
res 40/00:00:08:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
ata5: SError: { PHYRdyChg 10B8B DevExch }
ata5: SError: { PHYRdyChg 10B8B DevExch }
EXT4-fs (dm-3): warning: mounting fs with errors, running e2fsck is recommended
# ATA 总线错误(硬盘/存储设备问题)
第二步输出系统日志文件
tail -f /var/log/messages 监控到重启前输出如下
Jul 5 12:22:07 ipsan102321 kernel: EXT4-fs error (device dm-3): ext4_ext_find_extent: bad header/extent in inode #218106339: invalid magic - magic a6d9, entries 6315, max 49732(0), depth 52810(0)
Jul 5 12:22:07 ipsan102321 kernel: EXT4-fs error (device dm-3): file system corruption: inode #218106339 logical block 0 mapped to 0 (size 1)
Jul 5 12:22:07 ipsan102321 kernel: EXT4-fs error (device dm-3): ext4_ext_remove_space: bad header/extent in inode #218106339: invalid magic - magic a6d9, entries 6315, max 49732(0), depth 52810(0)
Jul 5 12:22:07 ipsan102321 kernel: EXT4-fs error (device dm-3): ext4_ext_find_extent: bad header/extent in inode #218106343: invalid magic - magic 33a8, entries 49551, max 62955(0), depth 3133(0)
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:------------[ cut here ]------------
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:invalid opcode: 0000 [#1] SMP
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:06:00.0/host14/target14:0:0/14:0:0:0/vendor
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:Stack:
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:Call Trace:
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:Code: 00 4d 89 ae 38 03 00 00 45 89 86 48 03 00 00 66 ff 00 66 66 90 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41
Jul 5 12:22:08 ipsan102321 kernel: ------------[ cut here ]------------
Jul 5 12:22:08 ipsan102321 kernel: kernel BUG at fs/ext4/extents.c:1975!
Jul 5 12:22:08 ipsan102321 kernel: invalid opcode: 0000 [#1] SMP
Jul 5 12:22:08 ipsan102321 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:06:00.0/host14/target14:0:0/14:0:0:0/vendor
Jul 5 12:22:08 ipsan102321 kernel: CPU 0
Jul 5 12:22:08 ipsan102321 kernel: Modules linked in: xfs vfat fat ext2 iptable_filter ip_tables coretemp(U) hwmon pcspkr iscsi_scst(U) crc32c_intel nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc scst_vdisk(U) libcrc32c scst_disk(U) scst(U) cbd(U) bwtrace(U) ipv6 ext4 jbd2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx kvm serio_raw sg e1000e(U) ext3 jbd mbcache sd_mod crc_t10dif ahci video output dm_mod [last unloaded: scsi_wait_scan]
Jul 5 12:22:08 ipsan102321 kernel:
Jul 5 12:22:08 ipsan102321 kernel: Modules linked in: xfs vfat fat ext2 iptable_filter ip_tables coretemp(U) hwmon pcspkr iscsi_scst(U) crc32c_intel nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc scst_vdisk(U) libcrc32c scst_disk(U) scst(U) cbd(U) bwtrace(U) ipv6 ext4 jbd2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx kvm serio_raw sg e1000e(U) ext3 jbd mbcache sd_mod crc_t10dif ahci video output dm_mod [last unloaded: scsi_wait_scan]
Jul 5 12:22:08 ipsan102321 kernel: Pid: 8618, comm: msu Not tainted 2.6.32-71.el6vsds.x86_64 #1 To be filled by O.E.M.
Jul 5 12:22:08 ipsan102321 kernel: RIP: 0010:[<ffffffffa01b0a96>] [<ffffffffa01b0a96>] ext4_ext_put_in_cache+0x86/0x90 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: RSP: 0018:ffff8800637236b8 EFLAGS: 00010246
Jul 5 12:22:08 ipsan102321 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jul 5 12:22:08 ipsan102321 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88005e8444c8
Jul 5 12:22:08 ipsan102321 kernel: RBP: ffff8800637236f8 R08: 0000000000000002 R09: 0000000000000000
Jul 5 12:22:08 ipsan102321 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jul 5 12:22:08 ipsan102321 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
Jul 5 12:22:08 ipsan102321 kernel: FS: 0000000000000000(0000) GS:ffff88002c200000(0063) knlGS:00000000ce4fcb90
Jul 5 12:22:08 ipsan102321 kernel: CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
Jul 5 12:22:08 ipsan102321 kernel: CR2: 00000000f6339000 CR3: 0000000067756000 CR4: 00000000000406f0
Jul 5 12:22:08 ipsan102321 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 5 12:22:08 ipsan102321 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 5 12:22:08 ipsan102321 kernel: Process msu (pid: 8618, threadinfo ffff880063722000, task ffff88006376aab0)
Jul 5 12:22:08 ipsan102321 kernel: Stack:
Jul 5 12:22:08 ipsan102321 kernel: ffff88005e8444c8 ffff880073070e00 ffff880063723738 ffff88005e8444c8
Jul 5 12:22:08 ipsan102321 kernel: <0> ffff8800018e0cf4 0000000000000000 0000000000000000 0000000000000001
Jul 5 12:22:08 ipsan102321 kernel: <0> ffff880063723858 ffffffffa01b353b ffff880000000000 0000000000000c3d
Jul 5 12:22:08 ipsan102321 kernel: Call Trace:
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa01b353b>] ext4_ext_get_blocks+0x2cb/0x1800 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa01b33b6>] ? ext4_ext_get_blocks+0x146/0x1800 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff812668ee>] ? __sg_alloc_table+0x7e/0x130
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81348ec0>] ? scsi_sg_alloc+0x0/0x60
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa019249a>] ext4_get_blocks+0x7a/0x2a0 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192f8d>] ext4_get_block+0xbd/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8119e038>] block_read_full_page+0x188/0x3c0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192ed0>] ? ext4_get_block+0x0/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192f8d>] ? ext4_get_block+0xbd/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff811a5a0f>] do_mpage_readpage+0x3bf/0x5f0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8110be79>] ? add_to_page_cache_locked+0xc9/0x140
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff811a5d99>] mpage_readpages+0xe9/0x130
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192ed0>] ? ext4_get_block+0x0/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0053786>] ? journal_stop+0x1e6/0x2c0 [jbd]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192ed0>] ? ext4_get_block+0x0/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa018e7cd>] ext4_readpages+0x1d/0x20 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81120a35>] __do_page_cache_readahead+0x185/0x210
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81120ae1>] ra_submit+0x21/0x30
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81120e55>] ondemand_readahead+0x115/0x240
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81121073>] page_cache_sync_readahead+0x33/0x50
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8110cf99>] generic_file_aio_read+0x559/0x730
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8116b70a>] do_sync_read+0xfa/0x140
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8117c36d>] ? do_filp_open+0x60d/0xd40
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81091a30>] ? autoremove_wake_function+0x0/0x40
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff811fe146>] ? security_file_permission+0x16/0x20
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8116c135>] vfs_read+0xb5/0x1a0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff810d3b72>] ? audit_syscall_entry+0x272/0x2a0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8116c271>] sys_read+0x51/0x90
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8104c81f>] sysenter_dispatch+0x7/0x2e
Jul 5 12:22:08 ipsan102321 kernel: Code: 00 4d 89 ae 38 03 00 00 45 89 86 48 03 00 00 66 ff 00 66 66 90 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41
Jul 5 12:22:08 ipsan102321 kernel: RIP [<ffffffffa01b0a96>] ext4_ext_put_in_cache+0x86/0x90 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: RSP <ffff8800637236b8>
Jul 5 12:22:08 ipsan102321 kernel: ---[ end trace cd4b2b2bbd6bbd7f ]---
Message from syslogd@ipsan102321 at Jul 5 12:22:08 ...
kernel:Kernel panic - not syncing: Fatal exception
Jul 5 12:22:08 ipsan102321 kernel: Kernel panic - not syncing: Fatal exception
Jul 5 12:22:08 ipsan102321 kernel: Pid: 8618, comm: msu Tainted: G D ---------------- 2.6.32-71.el6vsds.x86_64 #1
Jul 5 12:22:08 ipsan102321 kernel: Call Trace:
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff814c716e>] panic+0x78/0x137
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff814cb234>] oops_end+0xe4/0x100
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8101732b>] die+0x5b/0x90
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff814caae4>] do_trap+0xc4/0x160
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81014ee5>] do_invalid_op+0x95/0xb0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa01b0a96>] ? ext4_ext_put_in_cache+0x86/0x90 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81091a70>] ? wake_bit_function+0x0/0x50
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8119c249>] ? __find_get_block+0xa9/0x200
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8119ccb6>] ? __wait_on_buffer+0x26/0x30
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8119dd20>] ? sync_dirty_buffer+0x70/0xf0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81013f5b>] invalid_op+0x1b/0x20
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa01b0a96>] ? ext4_ext_put_in_cache+0x86/0x90 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa01b353b>] ext4_ext_get_blocks+0x2cb/0x1800 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa01b33b6>] ? ext4_ext_get_blocks+0x146/0x1800 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff812668ee>] ? __sg_alloc_table+0x7e/0x130
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff81348ec0>] ? scsi_sg_alloc+0x0/0x60
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa019249a>] ext4_get_blocks+0x7a/0x2a0 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192f8d>] ext4_get_block+0xbd/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8119e038>] block_read_full_page+0x188/0x3c0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192ed0>] ? ext4_get_block+0x0/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192f8d>] ? ext4_get_block+0xbd/0x120 [ext4]
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff811a5a0f>] do_mpage_readpage+0x3bf/0x5f0
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff8110be79>] ? add_to_page_cache_locked+0xc9/0x140
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffff811a5d99>] mpage_readpages+0xe9/0x130
Jul 5 12:22:08 ipsan102321 kernel: [<ffffffffa0192ed0>] ? ext4_get_block+0x0/0x120 [ext4]
找到文件系统故障:kernel BUG at fs/ext4/extents.c:1975!
找到 EXT4-fs error (device dm-3):
文件系统故障,估计是阵列故障或者硬盘故障导致的
第三步 尝试解决
dmsetup info /dev/dm-3
找到逻辑分区>找到raid阵列
拔硬盘重置阵列故障恢复