My Production JVM is Missing
Or — dude, where’s my JVM?
Once in a while, you’ll have a production issue and you may notice that your JVM is no longer running. ps -ef | grep <app name>
returns nothing. There are several possible reasons why the JVM is no longer running.
- Someone stopped it or killed it
- OutOfMemoryErrors in application code
- SIGSEGV or SIGBUS in JVM or JNI code
- Linux OutOfMemory Killer reaped it
- Incorrectly specified values for -Xmx and -Xms
- Potential issues when using tmpfs
The simplest, possibly unlikely case is that a user stopped it. If your application offers a REST endpoint to stop or restart the service the answer to the question may be as simple as someone hit that endpoint and deliberately stopped it. Hopefully, there is some trace in the endpoint’s code that you can locate in the log files. Maybe the username or some identifying info is included with the trace to help you track down who did it. Bonus points for asking why this REST endpoint was available to them.
Next, we should look throughout the log files, stdout, and stderr for OutOfMemoryErrors. Since OutOfMemoryErrors are unchecked exceptions, the developer is not required to handle this type of error.
OutOfMemory errors are often caused by the growth of unbounded data structures, for example, a Map or Set that continues to add elements, yet elements are never deleted. In this case, it’s only a matter of time before the JVM runs out of heap space, and attempting to allocate a new object returns an OutOfMemoryError.
(Note: throughout this article, the JVM options are for Java 8)
In this case, we can add some JVM options to help gather information regarding the gc process:
- XX:+PrintGCDetails
- -XX:+PrintGCDateStamps
- -Xloggc:/theTargetDirectory/gc.log
If the JVM crashes, remember to save the gc.log because restarting the JVM with the same JVM options will overwrite the gc.log file.
There are some JVM options that I always recommend when running the JVM in any environment (Dev, Prod, everywhere), to help us understand what happened in the case of OutOfMemoryErrors that are caused by application code:
- -XX:HeapDumpOnOutOfMemoryError — this option makes the JVM produce a heap dump when there is an OutOfMemoryError.
- -XX:HeapDumpPath=(filename) — this option tells the JVM where to produce the heap dump. Ensure there is enough space in this file system, since the heap dump may be about the same size as the heap (heap size is controlled by -Xmx and -Xms).
These JVM options have no run-time effect until the OutOfMemoryError occurs.
Analyzing the heap dump is a topic for another day. :)
The JVM may have crashed in the low-level code that implements the core of JVM, meaning it may have crashed in a library that is part of the JVM or it may have crashed in code that is JNI (Java-callable but written in another language, eg. C or C++). In these cases, you may find a file whose name is hs_err_pid(pid).txt (where “(pid)” is the actual pid number). When you open this file, you’ll find the reason for the crash, and a stack trace providing details about the thread that caused the crash. All the .jar files and system libraries that the JVM has open are listed in the hs_err_pid file. Information regarding shared memory segments is listed, and near the bottom of the file are machine configuration details. The last entry in the file is a timestamp from the file’s creation.
There are JVM options that control the placement of this file. When running any JVM-based program, you should always include this option:
-XX:ErrorFile=/theTargetDir/hs_err_%p.log
These log files can be very important when troubleshooting JVM crashes, and they are typically only a few MB in size. Many people choose to put these files in the same directory as the application’s log files.
Another potential way for the JVM to disappear is for the Linux OutOfMemory Killer to decide that the JVM’s memory requirements are too large or possibly that the rate of growth of the JVM’s memory footprint indicates that it may have been allocating objects in a loop and growing in an uncontrolled way. If the OutOfMemory Killer has identified your JVM as the “runaway process” it will kill it. When the Linux OutOfMemory Killer kills your JVM, there is trace generated to the system messages file that you can find by looking at the system messages log file or using the dmesg
command. The trace is pretty obvious; it is easy to identify when this happens.
The resolution to this issue would be to set the -Xmx and -Xms to the same value, which should be less than 70% of the memory on the server. This guideline changes with the amount of memory on the server (the percentage of memory for the heap will be smaller on machines with less memory), so there may be some trial-and-error in setting this value.
General tuning suggestions indicate that it is a good idea to reserve several GB for use by Linux, the File System buffer pool, and the inode cache. The JVM also requires memory for many things which are not included in heap space. The JVM has the application's byte code, some of this is loaded into memory, and there is a non-heap memory area for DirectIO ByteBuffers, native thread stacks, and the CodeCache. Depending on the amount of memory on the server, the amount of memory available for the heap size will be much less than the 70% previously specified. For example, on a server with 8GB, we could leave 2GB for Linux and the file system cache, and 2GB for the non-heap portions of the JVM. The max heap size would be about 3GB or about 38% of the server’s 8GB of memory.
Another additional factor to consider when setting the heap size via the -Xmx and -Xms JVM options is whether the system is configured with swap space. In the case of a system with a tiny swap space (2 or 3 GB), it is sometimes possible to exhaust the swap space and end up with a JVM crash with an hs_err_pid file that states “java.lang.OutOfMemoryError : unable to create new native thread”.
Another issue that is sometimes overlooked is when Linux has been set up to use a /tmpfs which is essentially the /tmp directory held in memory, not on disk. This can be a complication as the /tmpfs directory is essentially a memory-backed file system and reduces the amount of memory for the heap. It’s good practice to first subtract the /tmpfs size from the server’s memory size and fit the JVM into the remaining memory during the guidelines above.
In this note, we’ve learned about some of the things you can do to troubleshoot issues when the JVM crashes in your production or dev environment.