Memory management is a process of observing the Server behavior and tuning the JVM hotspot parameters so that the Fiorano Peer Server is able to perform at its highest throughput with the lowest risk of running into Memory related problems. This is not an optional step to just improve performance; memory tuning and management have to be done at the server to avoid a class of serious problems that could occur because of improper configurations of the Virtual Machine (VM) on which the Server is running. Problems of this nature are very hard to reproduce and can create a lot of confusion with regards to the server behavior.
For example, if one of the "Stop the world garbage collection" calls stops or pauses the Peer VM for more than the time the Enterprise Server waits for a response to an action command, the request times out and there will be an error thrown in Fiorano Studio. If the Primary Peer VM is paused for an interval greater than the ping interval, the backup HA Server might think the primary server is down which can lead to problems.
Tuning the memory is also very important for achieving throughput goals. A badly tuned server may spend almost all of its time doing garbage collection sparing only a few CPU cycles for the server execution.
The following sections describe the factors considered when the Peer Server VM is tuned.
Factors considered when Peer Server is Tuned
Physical Machine Configuration
One of the most important factors in tuning the Peer VM is the actual machine hardware configuration. The amount of physical memory, the number of CPUs or cores, the Operating System, type of the system (32-bit or 64-bit), and so on, all have a bearing on performance. The server can afford an efficient concurrent garbage collection if it has more Processor cycles to spare. A few GC algorithms run best on a single core and while others have to be run only on multi-core machines. The native process stack size can be reduced for some hardware and OS combinations while for others it is not safe to do so. If the heap space allocated to the peer is much larger than the actual physical memory, a lot of swap occurs between the cache and memory, severely degrading the server performance. A larger memory allocated for the server would also mean a larger pause time for the garbage collector to do a full system cleanup.
Java Virtual Machine
The version of JVM used and the type of JVM (32-bit or 64-bit) is probably the next most important factor in the tuning of parameters for the server VM. A newer version of the JVM may have new Garbage collection algorithms implemented and the existing algorithms can be better tuned or may behave differently in certain cases. A server running with a 64-bit JVM can take more heap space than one running with a 32-bit JVM. More heap memory can be allocated to the server with 64-bit JVM than one with a 32-bit JVM which has a physical limit of 4 GB on the Process Address Space.
Machine Setup
Although it is preferable to have a dedicated server setup for each Fiorano Peer Server, but due to hardware limitations, this often is not possible. Other programs may have to run simultaneously with the Peer Server which competes for available CPU cycles and available memory. The Peer Server has to be tuned differently depending on the number of other programs running on the deployment machine.
Processing Message Size
This is one of the most important factors deciding the amount of memory required by the Peer VM. The Peer VM is a JMS broker which transfers messages from the components to other components. The memory required by the Peer Server VM depends on JMS context settings like persistence, in-memory buffer size, number of connections/sessions/ created and more importantly the message sizes handled. The components running in-memory can require additional memory typically equivalent to four or five times the size of the message to parse the XML and process it.
Peer Server Load
The Peer Server load is the number of components which are launched by it. Components can be launched in-memory - which means the component runs in the same VM as the Peer Server. For each component, a set of JMS resources are created which consume Heap Memory and a certain number of threads are created per JMS session spawned. For components launched in-memory, the classes loaded for the components to be launched affect the VM perm gen space as well as the memory required by the component to run.
Recommendations
The following tree gives you a guideline of the JVM hotspot settings for a set of combinations for the above factors. These recommendations are only guidelines and not stead-fast rules. It is recommended to tune the server starting with one of the recommended settings which match closely with the setup. These recommendations have consistent system behavior as their top priority, followed by performance.
These recommendations do not incorporate the dimension associated with processing messages of a certain size. These recommendations assume that the message sizes are below 1 MB and will not actually affect the memory required by the Peer Server to process the messages.
Each leaf in the path of the tree represents a recommended setting and the color of the node indicates the risk level associated with the particular set up.
Green represents Low Risk and falls into recommended methods | |
Yellow represents Medium Risk and has to be used under caution | |
Red represents High Risk and the system is highly unstable. | |
Grey represents Low Risk with other factors assumed. The corresponding recommendations list the assumptions on which this represents Low Risk. |
Interpreting and Applying Recommendations
Each recommendation is identified by a unique number written over the leaf circle in the above tree. The recommended settings are listed below along with their description and a rationale as to why the setting is recommended. To apply these recommendations, find the tuning section in the server.conf (or fes.conf / fps.conf) and just add the texts found in the property names in that section.
{FIORANO_HOME}/esb/server/bin/server.conf{FIORANO_HOME}/esb/fes/bin/fes.conf{FIORANO_HOME}/esb/fps/bin/fps.conf
Table 1
Property Name | Description | Rationale |
---|---|---|
-Xms512m | Minimum Heap size | Dedicated servers can have the minimum and maximum Heap size to save the JVM from allocating and reallocating memory thereby increasing throughput. |
-Xmx512m | Maximum Heap size |
|
-XX:+UseParallelOldGC | Garbage Collection Algorithm | Parallel GC reduces pause time and works well on small heap spaces with not many processors to spare. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
Table 2
Property Name | Description | Rationale |
---|---|---|
-Xms128m | Minimum Heap size | Competing servers must have the lowest Xms which is safe for the server to execute. The VM is responsible for allocating more memory upto the maximum memory as and when required and shrink back when needed. |
-Xmx512m | Maximum Heap size |
|
-XX:+UseParellelOldGC | Garbage Collection Algorithm | Parallel GC reduces pause time and works well on small heap spaces with not many processors to spare |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
Table 3
Property Name | Description | Rationale |
---|---|---|
-Xms1024m | Minimum Heap size | Dedicated servers can have the minimum and maximum Heap size to save the JVM from allocating and reallocating memory, thereby increasing throughput. |
-Xmx1024m | Maximum Heap size |
|
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:+CMSIncrementalMode | Optimization option | This optimizes the algorithm when it is run with 2 or less cores. |
-XX:MaxPermSize=128 | Perm Gen size | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the meta data of the JVM and is not part of the Heap space. |
Table 4
Property Name | Description | Rationale |
---|---|---|
-Xms512m | Minimum Heap size | Competing servers must have the lowest Xms which is safe for the server to execute. The VM is responsible for allocating more memory upto the max memory limit and shrink back when needed. |
-Xmx1024m | Maximum Heap size |
|
-XX:+UseParellelOldGC | Garbage Collection Algorithm | Works with constrained amounts of CPU cycles and limits the pauses under the specified limit by a parallel collection. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=128 | Perm Gen size | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the meta data of the JVM and is not part of the Heap space. |
Table 5
Property Name | Description | Rationale |
---|---|---|
-Xms1024m | Minimum Heap size | Maximum heap space of 1280 is recommended. More memory allocated to the heap would suffocate other process memory regions and the number of threads that can be created would be dangerously low if this limit is exceeded. |
-Xmx1280m | Maximum Heap size |
|
-XX:+UseParellelOldGC | Garbage Collection Algorithm | Works with constrained amounts of CPU cycles and limits the pauses under the specified limit by a parallel collection. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=128 | Perm Gen size | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the meta data of the JVM and is not part of the Heap space. |
Table 6
Property Name | Description | Rationale |
---|---|---|
-Xms1280m | Minimum Heap size | Maximum Heap space of 1280 is recommended. More memory allocated to the heap would suffocate other process memory regions and the number of threads that can be created would be dangerously low if this limit is exceeded. |
-Xmx1280m | Maximum Heap size |
|
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=256 | Perm Gen Space | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the meta data of the JVM and is not part of the Heap space. |
Table 7
Property Name | Description | Rationale |
---|---|---|
-Xms1536m | Minimum Heap size | With 64-bit JVMs, the limit of 4GB Address space per process is removed and hence a large Heap memory can be allocated to a process without any side-effects. |
-Xmx1536m | Maximum Heap size |
|
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=256 | Perm Gen Space | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the metadata of the JVM and is not part of the Heap space. |
Table 8
Property Name | Description | Rationale |
---|---|---|
-Xms512m | Minimum Heap size | With 64-bit JVMs, the limit of 4GB Address space per process is removed and hence a large heap memory can be allocated to a process without any side-effects. |
-Xmx1536m | Maximum Heap size |
|
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=256 | Perm Gen Space | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the metadata of the JVM and is not part of the Heap space. |
-XX:ParallelGCThreads=<no_of_cpu / no_of_competing_servers> | Option to limit monopolizing CPU resources | If no single application is to monopolize the CPU for a long period of time, it is recommended to limit the number of parallel threads which can be spawned. |
Table 9
Property Name | Description | Rationale |
---|---|---|
-Xms1024m | Minimum Heap size | With 64-bit JVMs, the limit of 4GB Address space per process is removed and hence a large heap memory can be allocated to a process without any side-effects. |
-Xmx2048m | Maximum Heap size |
|
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=256 | Perm Gen Space | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the metadata of the JVM and is not part of the Heap space. |
-XX:ParallelGCThreads=<no_of_cpu / no_of_competing_servers> | Option to limit monopolizing CPU resources | If no single application is to monopolize the CPU for a long period of time, it is recommended to limit the number of parallel threads which can be spawned. |
Table 10
Property Name | Description | Rationale | |
---|---|---|---|
-Xms1536m | Minimum Heap size | With 64-bit JVMs, the limit of 4GB Address space per process is removed and hence a large Heap memory can be allocated to a process without any side-effects. | |
-Xmx3072m | Maximum Heap size |
| |
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large Heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. | |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). | |
-XX:MaxPermSize=512 | Perm Gen Space | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the meta data of the JVM and is not part of the Heap space. | |
-XX:DisableExplicitGC | Diabling explicit GC requests | It is dangerous for a misbehaving in-memory component to continuously issue System.GC calls when the Heap memory allocated is large. This option will disable actions against explicit calls. The memory of the server is best managed by the VM. Garbage collection calls from within software code can be extremely dangerous. | |
-XX:ParallelGCThreads=<no_of_cpu / no_of_competing_servers> | Option to limit monopolizing CPU resources | If no single application is to monopolize the CPU for a long period of time, it is recommended to limit the number of parallel threads which can be spawned. | |
Assumption: The Physical RAM is at least 8 GB. However 16 GB is recommended. |
|
|
|
Table 11
Property Name | Description | Rationale |
---|---|---|
-Xms2048m | Minimum Heap size | With 64-bit JVMs, the limit of 4GB Address space per process is removed and hence a large Heap memory can be allocated to a process without any side-effects. |
-Xmx3072m | Maximum Heap size |
|
-XX:+UseConcMarkSweepGC | Garbage Collection Algorithm | Works well for long running servers with a large Heap memory that can afford to share CPU cycles with the garbage collector. Would result in the lowest pause times. |
-XX:MaxGCPauseMillis=100000 | JVM Pause limit during GC | Limit the amount of time the garbage collector can pause the VM to be lower than the ESB call timeout (def: 120 seconds). |
-XX:MaxPermSize=512 | Perm Gen Space | When the number of components launched in-memory increases, so do the number of classes loaded. Perm Gen space stores the metadata of the JVM and is not part of the Heap space. |
-XX:DisableExplicitGC | Disabling explicit GC requests | It is dangerous for a misbehaving in-memory component to continuously issue System GC calls when the Heap memory allocated is large. This option will disable actions against explicit calls. The memory of the server is best managed by the VM. Garbage collection calls from within software code can be extremely dangerous. |
-XX:ParallelGCThreads=<no_of_cpu / no_of_competing_servers> | Option to limit monopolizing CPU resources | If no single application is to monopolize the CPU for a long period of time, it is recommended to limit the number of parallel threads which can be spawned. |
Handling Memory Problems
One common issue that Users face is the server JVM exiting with a java.lang.OutOfMemoryError exception. This error is thrown when there is insufficient space to allocate an object. That is, garbage collection cannot make any further space available to accommodate the new object, and the heap cannot be expanded further. An OutOfMemoryError does not necessarily imply a memory leak; the issue might simply be a case of allocating more memory for the server to perform its operations.
The first step in diagnosing an OutOfMemoryError is to examine the full error message. Generally, in the exception message, additional information is supplied which hints at a reason as to why the JVM ran out of Memory. The following contains a list of some common examples of what that additional information may be, what it may mean, and what to do about it.
Java Heap Space
This indicates that an object could not be allocated on the heap. This issue may just be a configuration problem related to assigning more Heap space to the server. This can be done so using the –Xmx option in the server.conf file. Following the recommendations above, if enough memory has been assigned to the server and the server still runs out of Java Heap space, then there could be a memory leak at the server end given that the possibility of an in-memory running component causing the damage is ruled out. Custom components should be checked for any memory leak with profiling tools available before they are run in-memory.
Debugging
In this case, a Reproducible set of steps can be provided to the Fiorano tech support and if its not reproducible every time, the GC Logging can be enabled and the logs can be sent for analysis. This can also indicate that a profiling tool analysis is required at the server. Profiling tools help in monitoring the number of objects pending finalization and to view all reachable objects while understing which references are keeping each one alive.
Permgen Space
This indicates that the permanent generation is full. This is the area where the JVM stores its meta-data, data about classes loaded and threads running. If the Peer Server runs a lot of components in-memory then the perm gen space needs to be increased. Please follow the recommendations to find a suitable value for the permgen space.
Debugging
Increasing the perm gen space generally solves this problem since there cannot be any memory leak in this region as it is not available to the programs. However, if the server misbehaves in loading a huge number of classes when it launches components in-memory, there could be a problem. Using a Java profiling tool to determine the number of classes loaded by the Peer Server and the components running in-memory should solve this situation.
Requested Array size exceeds VM Limit
This indicates that the application attempted to allocate a continuous array whose size is larger than the available contiguous memory in the heap. In most cases the problem is likely to be either that the heap size is too small or set incorrectly or that a bug results in the server attempting to create an array whose size is incorrectly calculated. If the Xmx is set to more than 256 MB and if this error occurs repeatedly, steps to solve the problem should be reported to the Fiorano team.
Swap Space
Another possibility is that the server might run out of Swap Space even though the Heap space may not be near to the specified Xmx value. There is a restriction on 32 bit machines where a process can only have up to 4 GB of addressable memory. Each java process have several segments of memory for the java memory and the native memory (memory used the operating system to run the process). Native code competes with the JVM to use the 4GB of addressable space in the application. Such problems generally occur when the heap sizes are specified to more than 1.6 GB on a 32 bit machine not allowing enough memory for the native code to run. A simple solution to this may be to reduce the Xmx setting to a lower value. Please follow the recommendations above for maximum memory guidelines. This problem can also be solved using a 64 bit JVM on a 64 bit server which has practically no restrictions on the reference-able address spaces for a Process.
Unable to Create New Native Thread
This means that there is not enough memory to create a new thread. This could be due to JVM heap, other native memory or Perm Gen memory sections taking up all the memory and leaving nothing for Stack Memory. It could also mean that the server has already spawned the maximum number of threads allowed by the Operating System, though this is rare. If the former is the reason, it is generally solved by reducing the heap size and/or PermGen space. If this solution does not solve the problem, a last option to solve the problem is to try to optimize the stack size of a thread.
The stack size can be set using –Xss. If the threads complain that the stack size is too small then this number needs to be increased and the operation retried. Setting the stack size to half of the original would allow twice the number of threads to be created and this may solve the problem. If the above suggestions do not solve the problem, then the thread count, the no of components running in-memory, no of JMS clients connected and the frequency of messages should be observed and reported.
Enabling GC logging
One of the easiest ways to get initial information about garbage collections is to specify the options –XX:PrintGCDetails, -XX:PrintGCTimeStamps. For every collection, this option will result in the output of the information such as the size of live objects, before and after garbage collection for various generations, the total available space for each generation and the length of time the collection took. This also outputs a timestamp at the start of each collection which helps in correlating GC logs with the Server logs.
The verbose logging will be present in the server logs and in the output console of the server. This console can be sent to the Fiorano team for further analysis.