生成
1. 生成error 文件的路徑:你可以通過(guò)參數(shù)設(shè)置-XX:ErrorFile=/path/hs_error%p.log, 默認(rèn)是在Java運(yùn)行的當(dāng)前目錄 [default: ./hs_err_pid%p.log]
2. 參數(shù)-XX:OnError 可以在crash退出的時(shí)候執(zhí)行命令,格式是-XX:OnError=“string”, <string> 可以是命令的集合,用分號(hào)做分隔符, 可以用"%p"來(lái)取到當(dāng)前進(jìn)程的ID.
例如:
1
2
|
// -XX:OnError="pmap %p" // show memory map // -XX:OnError="gcore %p; dbx - %p" // dump core and launch debugger |
在Linux中系統(tǒng)會(huì)fork出一個(gè)子進(jìn)程去執(zhí)行shell的命令,因?yàn)槭怯胒ork可能會(huì)內(nèi)存不夠的情況,注意修改你的 /proc/sys/vm/overcommit_memory
參數(shù),不清楚為什么這里不使用vfork
3. -XX:+ShowMessageBoxOnError 參數(shù),當(dāng)jvm crash的時(shí)候在linux里會(huì)啟動(dòng)gdb 去分析和調(diào)式,適合在測(cè)試環(huán)境中使用。
什么情況下不會(huì)生成error文件
linux 內(nèi)核在發(fā)生OOM的時(shí)候會(huì)強(qiáng)制kill一些進(jìn)程, 可以在/var/logs/messages中查找
Error crash 文件的幾個(gè)重要部分
a. 錯(cuò)誤信息概要
1
2
3
4
5
6
7
8
9
10
11
12
13
|
# A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV ( 0xb ) at pc= 0x0000000000043566 , pid= 32046 , tid= 1121192256 # # JRE version: 6 .0_17-b04 # Java VM: Java HotSpot(TM) 64 -Bit Server VM ( 14.3 -b01 mixed mode linux-amd64 ) # Problematic frame: # C 0x0000000000043566 # # If you would like to submit a bug report, please visit: # http: //java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. |
SIGSEGV 錯(cuò)誤的信號(hào)類型
pc 就是IP/PC寄存器值也就是執(zhí)行指令的代碼地址
pid 就是進(jìn)程id
# Problematic frame:
# V [libjvm.so+0x593045]
就是導(dǎo)致問題的動(dòng)態(tài)鏈接庫(kù)函數(shù)的地址
pc 和 +0x593045 指的是同一個(gè)地址,只是一個(gè)是動(dòng)態(tài)的偏移地址,一個(gè)是運(yùn)行的虛擬地址
b.信號(hào)信息
Java中在linux 中注冊(cè)的信號(hào)處理函數(shù),中間有2個(gè)參數(shù)info, ucvoid
1
2
3
4
5
6
7
8
9
10
|
static void crash_handler( int sig, siginfo_t* info, void * ucVoid) { // unmask current signal sigset_t newset; sigemptyset(&newset); sigaddset(&newset, sig); sigprocmask(SIG_UNBLOCK, &newset, NULL); VMError err(NULL, sig, NULL, info, ucVoid); err.report_and_die(); } |
在crash report中的信號(hào)錯(cuò)誤提示
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000043566
信號(hào)的詳細(xì)信息和si_addr 出錯(cuò)誤的內(nèi)存,都保存在siginfo_t的結(jié)構(gòu)體中,也就是信號(hào)注冊(cè)函數(shù)crash_handler里的參數(shù)info,內(nèi)核會(huì)保存導(dǎo)致錯(cuò)誤的內(nèi)存地址在用戶空間的信號(hào)結(jié)構(gòu)體中siginfo_t,這樣在進(jìn)程在注冊(cè)的信號(hào)處理函數(shù)中可以取得導(dǎo)致錯(cuò)誤的地址。
c.寄存器信息
1
2
3
4
5
6
7
|
Registers: RAX= 0x00002aacb5ae5de2 , RBX= 0x00002aaaaf46aa48 , RCX= 0x0000000000000219 , RDX= 0x00002aaaaf46b920 RSP= 0x0000000042d3f968 , RBP= 0x0000000042d3f9c8 , RSI= 0x0000000042d3f9e8 , RDI= 0x0000000045aef9b8 R8 = 0x0000000000000f80 , R9 = 0x00002aaab3d30ce8 , R10= 0x00002aaaab138ea1 , R11= 0x00002b017ae65110 R12= 0x0000000042d3f6f0 , R13= 0x00002aaaaf46aa48 , R14= 0x0000000042d3f9e8 , R15= 0x0000000045aef800 RIP= 0x0000000000043566 , EFL= 0x0000000000010202 , CSGSFS= 0x0000000000000033 , ERR= 0x0000000000000014 TRAPNO= 0x000000000000000e |
寄存器的信息就保存在b部分的信號(hào)處理函數(shù)參數(shù) (ucontext_t*)usVoid中
在X86架構(gòu)下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
|
void os::print_context(outputStream *st, void *context) { if (context == NULL) return ; ucontext_t *uc = (ucontext_t*)context; st->print_cr( "Registers:" ); #ifdef AMD64 st->print( "RAX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RAX]); st->print( ", RBX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RBX]); st->print( ", RCX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RCX]); st->print( ", RDX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RDX]); st->cr(); st->print( "RSP=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RSP]); st->print( ", RBP=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RBP]); st->print( ", RSI=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RSI]); st->print( ", RDI=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RDI]); st->cr(); st->print( "R8 =" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R8]); st->print( ", R9 =" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R9]); st->print( ", R10=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R10]); st->print( ", R11=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R11]); st->cr(); st->print( "R12=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R12]); st->print( ", R13=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R13]); st->print( ", R14=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R14]); st->print( ", R15=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_R15]); st->cr(); st->print( "RIP=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_RIP]); st->print( ", EFL=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EFL]); st->print( ", CSGSFS=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_CSGSFS]); st->print( ", ERR=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_ERR]); st->cr(); st->print( " TRAPNO=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_TRAPNO]); # else st->print( "EAX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EAX]); st->print( ", EBX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EBX]); st->print( ", ECX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_ECX]); st->print( ", EDX=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EDX]); st->cr(); st->print( "ESP=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_UESP]); st->print( ", EBP=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EBP]); st->print( ", ESI=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_ESI]); st->print( ", EDI=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EDI]); st->cr(); st->print( "EIP=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EIP]); st->print( ", CR2=" INTPTR_FORMAT, uc->uc_mcontext.cr2); st->print( ", EFLAGS=" INTPTR_FORMAT, uc->uc_mcontext.gregs[REG_EFL]); #endif // AMD64 st->cr(); st->cr(); intptr_t *sp = (intptr_t *)os::Linux::ucontext_get_sp(uc); st->print_cr( "Top of Stack: (sp=" PTR_FORMAT ")" , sp); print_hex_dump(st, (address)sp, (address)(sp + 8 *sizeof(intptr_t)), sizeof(intptr_t)); st->cr(); // Note: it may be unsafe to inspect memory near pc. For example, pc may // point to garbage if entry point in an nmethod is corrupted. Leave // this at the end, and hope for the best. address pc = os::Linux::ucontext_get_pc(uc); st->print_cr( "Instructions: (pc=" PTR_FORMAT ")" , pc); print_hex_dump(st, pc - 16 , pc + 16 , sizeof( char )); } |
寄存器的信息在分析出錯(cuò)的時(shí)候是非常重要的
打印出執(zhí)行附近的部分機(jī)器碼
1
2
3
4
5
|
Instructions: (pc= 0x00007f48f14ef51a ) 0x00007f48f14ef4fa : 90 90 55 48 89 e5 48 81 ec 98 9f 00 00 48 89 bd 0x00007f48f14ef50a : f8 5f ff ff 48 89 b5 f0 5f ff ff b8 00 00 00 00 0x00007f48f14ef51a : c7 00 01 00 00 00 c6 85 00 60 ff ff ff c9 c3 90 0x00007f48f14ef52a : 90 90 90 90 90 90 55 48 89 e5 53 48 8d 1d 94 00 |
在instruction 部分中會(huì)打印出部分的機(jī)器碼
格式是
地址:機(jī)器碼
第一種使用udis庫(kù)里帶的udcli工具來(lái)反匯編
命令:
echo '90 90 55 48 89 e5 48 81 ec 98 9f 00 00 48 89 bd' | udcli -intel -x -64 -o 0x00007f48f14ef4fa
顯示出對(duì)應(yīng)的匯編
第二種可以用
objectdump -d -C libjvm.so >> jvmsodisass.dump
查找偏移地址 0x593045, 就是當(dāng)時(shí)的執(zhí)行的匯編,然后結(jié)合上下文,源碼推測(cè)出問題的語(yǔ)句。
d.寄存器對(duì)應(yīng)的內(nèi)存的值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
RAX= 0x0000000000000000 is an unknown value RBX= 0x000000041a07d1e8 is an oop {method} - klass: {other class } RCX= 0x0000000000000000 is an unknown value RDX= 0x0000000040111800 is a thread RSP= 0x0000000041261b88 is pointing into the stack for thread: 0x0000000040111800 RBP= 0x000000004126bb20 is pointing into the stack for thread: 0x0000000040111800 RSI= 0x000000004126bb80 is pointing into the stack for thread: 0x0000000040111800 RDI= 0x00000000401119d0 is an unknown value R8 = 0x0000000040111c40 is an unknown value R9 = 0x00007f48fcc8b550 : <offset 0xa85550 > in /usr/java/jdk1. 6 .0_30/jre/lib/amd64/server/libjvm.so at 0x00007f48fc206000 R10= 0x00007f48f8ca7d41 is an Interpreter codelet method entry point (kind = native ) [ 0x00007f48f8ca7ae0 , 0x00007f48f8ca8320 ] 2112 bytes R11= 0x00007f48fc98f270 : <offset 0x789270 > in /usr/java/jdk1. 6 .0_30/jre/lib/amd64/server/libjvm.so at 0x00007f48fc206000 R12= 0x0000000000000000 is an unknown value R13= 0x000000041a07d1e8 is an oop {method} - klass: {other class } R14= 0x000000004126bb88 is pointing into the stack for thread: 0x0000000040111800 R15= 0x0000000040111800 is a thread |
jvm 會(huì)通過(guò)寄存器的值對(duì)找對(duì)應(yīng)的對(duì)象,也是一個(gè)比較好的參考
e. 其他的信息
error 里面還有一些線程信息,還有當(dāng)時(shí)內(nèi)存映像信息,這些都可以作為分析的部分參考
crash 報(bào)告可以大概的反應(yīng)出一個(gè)當(dāng)時(shí)的情況,特別是在沒有core dump的時(shí)候,是比較有助于幫助分析的,但如果有core dump的話,最終還是core dump能快速準(zhǔn)確的發(fā)現(xiàn)問題原因。
以上就是本文的全部?jī)?nèi)容,希望本文的內(nèi)容對(duì)大家的學(xué)習(xí)或者工作能帶來(lái)一定的幫助,同時(shí)也希望多多支持服務(wù)器之家!
原文鏈接:http://blog.csdn.net/raintungli/article/details/7642575