KK List of Java 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Composition of the Young Generation

In order to understand GC, let's learn about the young generation,


where the objects are created for the first time. The young generation
is divided into 3 spaces.

One Eden space

Two Survivor spaces

There are various techniques to improve the performance of


your Java application. In this article I will talk about Statement
Pooling Configuration and its effect on Garbage
Collection process.

Statement Pooling allows to improve the performance of an


application by caching SQL statements that are used
repeatedly. Such caching mechanism allows to prepare
frequently used statements only once and reuse them multiple
times, thus reducing the overall number of times the database
server has to parse, plan, and optimize these queries. A wellconfigured number of statements (maxStatements) to be
cached can be as good as tuning the Garbage Collection. Now
let's see how Statement Pooling can affect the Garbage
Collection.

Why Check the Number of Statement in the Pool?

Often the size of the JDBC statement pool is set to the default
value. Using the default value, of course does not usually lead
to any special issue. But a well-configured maxStatements value
can be as effective as GC tuning. If you are using the
default maxStatements value and would like to optimize the use
of memory, let's think about the correct statement pool value
before attempting GC tuning.

As was discussed in Understanding Java Garbage Collection,


a weak generational hypothesis (most objects quickly
become unreachable and a reference from an old object to a

new object is rare) was used as the precondition when creating


garbage collector in Java. For the majority of NHN web services
there should be a response within 300ms at the latest, unless it
is a special case. Therefore, NHN web services are more
applicable to the above situations than the general stand-alone
type applications.

The GC Process between HTTP Request and


Response

When developing a web service using web containers like


Tomcat and other frameworks, the lifespan of objects created by
a developer tend to be either very short or very long. Web
developers usually write codes like Interceptor, Action, BO, or
DAO (BO and DAO are generated and used as singletons from
applicationContex in Spring, and are not the target of GC). The
objects generated from these codes stay alive for a very brief
time that exists between the time HTTP is requested and the
time it has responded. For this reason, such objects are usually
collected during Young GC.

There are also objects, such as singleton objects, that stay


alive long enough to exist for the lifecycle of Tomcat. Such
objects will be promoted to the old area soon after Tomcat
starts running. Yet, when continuouslymonitoring web
applications through jstat and the like, there are always some
objects promoted to the oldarea during Young GC. These objects
are usually used after being stored in the cache used for
improving the performance of frameworks in most of the
containers and projects. Whether the cached objects become
the target of GC or not is determined by their cache hit ratio,
not their age, so unless the hit ratio is 100%, they cannot avoid
being promoted to the old area, even when the Young GC cycle
is set to be long.

Among these caches, statement pooling affects the


memory usage the most. If you are using iBatis, as iBatis
processes all SQLs as preparedStatment, you will be using

statement pooling. If the size of statement pooling is smaller


than the number of SQLs being used, the cache hit ratio will
decrease and result in cache maintenance cost. Objects that
are reachable in the old area become the target of GC and will
be retrieved, then will be regenerated during the HTTP request
process, only to be cached and promoted to the old area. The
full GC cycles are affected by this process.

Size of the Statement Objects

It would be safe to say that the size of a single statement object


is proportional to the length of the SQL code processed by the
same statement. Even for a long and complex SQL, the size of
the object should be around 500 bytes. The object's small size
would seem to have little effect on the full GC cycles, but such
an assumption would be incorrect.

When you look at the JDBC specifications, each connection has


its own statement pool (maxStatementsPerConnection), as
described in Figure 1 below. So, although a statement object is
as small as 500 bytes, if there are many connections, the
statements cache may occupy the proportional amount of the
heap.

Figure 1: Relationship between the Connection and the


Statement.
(Though the statement has the ResultSet, it should be clarified
that ResultSet is not an object for caching. ResultSet is
allocated as null when rs.close() is called by iBatis, then
retrieved in the young area during young GC.)

There are 3 spaces in total, two of which are Survivor spaces. The
order of execution process of each space is as below:
1. The majority of newly created objects are located in the Eden
space.
2. After one GC in the Eden space, the surviving objects are moved
to one of the Survivor spaces.
3. After a GC in the Eden space, the objects are piled up into the
Survivor space, where other surviving objects already exist.
4. Once a Survivor space is full, surviving objects are moved to the
other Survivor space. Then, the Survivor space that is full will be
changed to a state where there is no data at all.
5. The objects that survived these steps that have been repeated
a number of times are moved to the old generation.
As you can see by checking these steps, one of the Survivor spaces
must remain empty. If data exists in both Survivor spaces, or the
usage is 0 for both spaces, then take that as a sign that something
is wrong with your system.
The process of data piling up into the old generation through minor
GCs can be shown as in the below chart:

Figure 3: Before & After a GC.


Note that in HotSpot VM, two techniques are used for faster memory
allocations. One is called "bump-the-pointer," and the other is
called "TLABs (Thread-Local Allocation Buffers)."
Bump-the-pointer technique tracks the last object allocated to the
Eden space. That object will be located on top of the Eden space. And
if there is an object created afterwards, it checks only if the size of the

object is suitable for the Eden space. If the said object seems right, it
will be placed in the Eden space, and the new object goes on top. So,
when new objects are created, only the lastly added object needs to
be checked, which allows much faster memory allocations. However,
it is a different story if we consider a multithreaded environment. To
save objects used by multiple threads in the Eden space for ThreadSafe, an inevitable lock will occur and the performance will drop due
to the lock-contention. TLABs is the solution to this problem in
HotSpot VM. This allows each thread to have a small portion of its
Eden space that corresponds to its own share. As each thread can
only access to their own TLAB, even the bump-the-pointer technique
will allow memory allocations without a lock.
This has been a quick overview of the GC in the young generation.
You do not necessarily have to remember the two techniques that I
have just mentioned. You will not go to jail for not knowing them. But
please remember that after the objects are first created in the Eden
space, and the long-surviving objects are moved to the old generation
through the Survivor space.

GC for the Old Generation


The old generation basically performs a GC when the data is full. The
execution procedure varies by the GC type, so it would be easier to
understand if you know different types of GC.
According to JDK 7, there are 5 GC types.
1. Serial GC
2. Parallel GC
3. Parallel Old GC (Parallel Compacting GC)
4. Concurrent Mark & Sweep GC (or "CMS")
5. Garbage First (G1) GC
Among these, the serial GC must not be used on an operating
server. This GC type was created when there was only one CPU core
on desktop computers. Using this serial GC will drop the application
performance significantly.
Now let's learn about each GC type.

Serial GC (-XX:+UseSerialGC)
The GC in the young generation uses the type we explained in the
previous paragraph. The GC in the old generation uses an algorithm
called "mark-sweep-compact."
1. The first step of this algorithm is to mark the surviving objects
in the old generation.
2. Then, it checks the heap from the front and leaves only the
surviving ones behind (sweep).
3. In the last step, it fills up the heap from the front with the
objects so that the objects are piled up consecutively, and
divides the heap into two parts: one with objects and one
without objects (compact).
The serial GC is suitable for a small memory and a small number of
CPU cores.

Parallel GC (-XX:+UseParallelGC)

Figure 4: Difference between the Serial GC and Parallel GC.


From the picture, you can easily see the difference between the serial
GC and parallel GC. While the serial GC uses only one thread to
process a GC, the parallel GC uses several threads to process a GC,
and therefore, faster. This GC is useful when there is enough memory
and a large number of cores. It is also called the "throughput GC."

You might also like