java - How can I debug a non-responsive server, when the profiler can't collect samples? -


i have been having occasional problems server wrote. it's in clojure, don't think matters, , can pretend it's in java. anyway, works fine hours @ time, goes fits behaves badly: activity stops, around fifteen seconds, , works few seconds, stops fifteen seconds...and on (usually) ten minutes or so, after goes behaving normally.

i've done lot of profiling of yourkit, , i've ruled out number of plausible suspects:

  • it's not garbage collection issue: i'm running -xx:+useconcmarksweepgc, , i've verified server continues run fine during both minor , major collections, due concurrent nature of garbage collector. , we're not thrashing run out of total memory or something: current heap size below max.

  • i don't think it's locking/synchronization issue, i'm not 100% sure on that. yourkit profiler shows threads waiting sometimes, eg competing on lock system.out produce log messages, long waits worker threads in threadpools when there's nothing do. , of course yourkit says it's never detected deadlocks.

  • it's not caused having profiler attached, because still happens if boot server , leave alone without ever attaching profiler.

  • it's not other process on system taking cpu time: top shows cpu usage @ 100% java process, , 0% else.

my biggest problem can't see server doing during these strange funks, because profiler stops receiving samples. here's graph of cpu usage chart:

yourkit cpu-graph screenshot

the left side of graph normal operation, during profiler samples every second or so. right side "broken", , spiky because profiler getting samples every ten seconds or so. in samples get, server seems doing usual business: responding requests , on; , logs confirm doing normal stuff, @ times profiler has samples for: during upward-sloping "straight lines" on graph, profiler has no samples, server doing nothing @ all.

so, graph familiar anyone? have had problem before , fixed it? or can point me in direction of tool can figure out server doing during time when yourkit can't? in case matters, server machine running ubuntu 10.04, and

$ java -version java version "1.6.0_22" openjdk runtime environment (icedtea6 1.10.10) (rhel-1.28.1.10.10.el5_8-x86_64) openjdk 64-bit server vm (build 20.0-b11, mixed mode) 

okay, comments seems clear me not going able figure out information you've given far. best can give suggestions on how debug it...

i try use jstack during 1 of spikes , see if can use figure out hangs.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -