The phrase "memory leak" can cause shivers to run down the spine of the most seasoned developer. Having some process on your server that is gloaming onto memory and failing to release it is a guaranteed all nighter lurking somewhere in your future. Recently we were debugging a new, soon to be released application. We discovered what looked like a memory leak. The JVM memory used would climb steadily toward the maximum heap size. When the runtime garbage collection kicked in it would reduce the memory only by about a third of the increase. So, for example, memory use would climb from 300 megs to 600 megs and then GC would reduce usage back to 500 megs and so on. This situation would inevitably lock up the server with out of memory errors. What follows is a recap of our troubleshooting journey.
While CF Guru Mike Klostermeyer examined the code that was most commonly in play, I was tasked with examining the Java Settings. The first thing I did was fiddle with manually firing garbage collection. One set of code I found looked like this:
I decided to try a "system" garbage collection. As I understand it, the runtime GC is a suggestion, but the system GC is a "stop the world" command. I whipped up the following code:
Now in case you missed it in school or on Entertainment Tonight, Java divides the heap into "new" generation space and "old" (tenured) generation space. Without boring you with copious details, objects are always created on the "new" generation space on the theory that most objects live a very short time. For example, when you create a variable in the local scope it survives until the request ends and then it can be safely deleted from the heap. So the vast majority of variables and objects in any code base survive only a short time and live their entire life in the hurky jurky world of the "new" generation. Objects that are intended to live beyond a single request (like application and session variables) get a buyout and are moved to the "old" generation space.
A lot of the oddly named Java switches that we play with in the Jvm.config file have to do with allocating memory or collecting memory on the "old" or the "new" heap space. For example, -XX:+UseParNewGC specifies to the JVM which GC to use for cleaning up the new space. Anyway, the theory in our case was that the runaway memory allocation was occurring in "tenured" (old) memory. Mike and I began to work with the scopes that we considered candidates for "old" memory - application, server and session scoped objects and variables. After a day Mike finally found this snippet of code.
The purpose of this code is to retrieve a Real time stock quote in order to append the value to one of 12 or 13 studies and charts. Because we didn't want to get the quote 12 times in a row we are storing the quote in the session and then we accessing it from the subsequent (Nearly simultaneous) requests. The "application.ChartDataObj" is a collection of methods with no properties attached. So the code above either pulls the data directly from the session, or creates it directly and references it in the session. In either case the goal of this code block is to create the "snapData" variable (an array) for use later on in the function. The variable "snapData" is correctly vared at the top of the function.
When mike removed all of this code and replaced it with just:
What can be gleaned from this exercise? Well at least one "rule of thumb" for us is to carefully consider how we handle objects that are cached in persistent scopes. To boil it down to a single rule it would be "Avoid referencing returned objects from one persistent scope to another and copy by value instead".
In case you wanted a rundown of our final JVM arguments arrived at through trial and error - we found the following to work well in our environment (Your environment may be quite different):
We are also indebted (as always) to the many fine gurus who help out cheerily on the email lists to which we subscribe. The following blog posts on Java and Coldfusion deserve honorable mention: