java - How can I debug a non-responsive server, when the profiler can't collect samples? -
i have been having occasional problems server wrote. it's in clojure, don't think matters, , can pretend it's in java. anyway, works fine hours @ time, goes fits behaves badly: activity stops, around fifteen seconds, , works few seconds, stops fifteen seconds...and on (usually) ten minutes or so, after goes behaving normally.
i've done lot of profiling of yourkit, , i've ruled out number of plausible suspects:
it's not garbage collection issue: i'm running
-xx:+useconcmarksweepgc
, , i've verified server continues run fine during both minor , major collections, due concurrent nature of garbage collector. , we're not thrashing run out of total memory or something: current heap size below max.i don't think it's locking/synchronization issue, i'm not 100% sure on that. yourkit profiler shows threads waiting sometimes, eg competing on lock system.out produce log messages, long waits worker threads in threadpools when there's nothing do. , of course yourkit says it's never detected deadlocks.
it's not caused having profiler attached, because still happens if boot server , leave alone without ever attaching profiler.
it's not other process on system taking cpu time:
top
shows cpu usage @ 100% java process, , 0% else.
my biggest problem can't see server doing during these strange funks, because profiler stops receiving samples. here's graph of cpu usage chart:
the left side of graph normal operation, during profiler samples every second or so. right side "broken", , spiky because profiler getting samples every ten seconds or so. in samples get, server seems doing usual business: responding requests , on; , logs confirm doing normal stuff, @ times profiler has samples for: during upward-sloping "straight lines" on graph, profiler has no samples, server doing nothing @ all.
so, graph familiar anyone? have had problem before , fixed it? or can point me in direction of tool can figure out server doing during time when yourkit can't? in case matters, server machine running ubuntu 10.04, and
$ java -version java version "1.6.0_22" openjdk runtime environment (icedtea6 1.10.10) (rhel-1.28.1.10.10.el5_8-x86_64) openjdk 64-bit server vm (build 20.0-b11, mixed mode)
okay, comments seems clear me not going able figure out information you've given far. best can give suggestions on how debug it...
i try use jstack during 1 of spikes , see if can use figure out hangs.
Comments
Post a Comment