Selection, Initialization and Activation of Application Specific Garbage Collector: SpecJVM2008

Selection, Initialization and Activation of Application Specific Garbage Collector: SpecJVM2008

Nitan S. Kotwal1*, Shubhnandan S. Jamwal2† and Devanand3‡

1,2University of Jammu, Jammu, India3Central University of Jammu, Jammu, India

[email protected],[email protected],[email protected]

Abstract. Different applications written in java are suitable to execute with different garbage collectors (GC’s). In the current scenario the selection of a garbage collector depends on ma-chine architecture and operating system. Serial collector is invoked for most of the application running on client class and parallel collector is invoked for applications running on server class. In this research paper we have proposed a solution for the intelligent selection and activation of garbage collector with appropriate parameters for finding the optimal performance for each benchmark of SPECjvm2008. The parameter settings which we have obtained after testing and analysis of data are used to decide the heap size and the GC for the benchmark. That setting would give optimal and best performance for that particular benchmark. The proposed model achieves improvement of 2% in throughput and also reduced the execution time of the applica-tion by 7% as compared to the default collector. It can also be used for the real time application because of the improved performance by reducing the frequency of collection. Garbage col-lection time is also reduced by 39%. Number of times the application stopped due to garbage collection is also reduced. Memory reclamation after garbage collection is also increased.

Keywords: Metrics, intelligent selection, application-specific, optimal performance, collections.

1. Introduction

1.1 Why Garbage Collection is necessary?Garbage Collection is the process by which memory occupied by the dead objects or those objects that are no longer referenced from any live data structure of any program are collected and added to pool of free memory. This task is entrusted with Garbage Collector (GC) of JDK. The four GC’s in jdk1.7.0_04 are serial, parallel, parallelOld, and concurrent Mark Sweep. The default collector chosen by JVM depends on the machine architecture and operating system. For Server Class with Server Virtual Machine (VM) or Client VM, Parallel GC is chosen by default and for

© Elsevier Publications 2014.

*Corresponding author.Research Scholar, Department of Computer Science & IT, University of Jammu, Jammu, India†Assistant Professor, Department of Computer Science & IT, University of Jammu, Jammu, India‡Dean and Professor, Department of Computer Science, Central University of Jammu, Jammu, India

Int. Conf. on Adv. in Comp., Comm., and Inf. Sci.(ACCIS-14) (1–10)

2

Nitan S. Kotwal. et.al.

Client Class with Server VM or Client VM, Serial GC is the default collector. The GC’s can also be explicitly invoked at the command line prompt by supplying GC specific commands and Serial GC can be invoked by –XX:+UseSerialGC, Parallel GC can be invoked by –XX:+UseParallelGC, ParallelOld GC can be invoked by –XX:+UseParallelOldGC and Concurrent Mark Sweep GC can be invoked by –XX:+UseConcMarkSweepGC. Till now only Serial and Parallel GC are automatically invoked depending on the Class of Machine and Operating system. But users can also invoke a specific garbage collector depending on the characteristics of the application (Sun Microsystem, 2006).

1.1 Application specific GC There is negligible effect of garbage collectors on the applications that are small in size (in kilobyte) but the applications with large size (in gigabytes and terabyte’s) must be executed with a specific collector. (S. Soman C. K., 2007) , showed that performance of an application is dependent on the application behavior and available resources. They also showed that no GC performs better for all applications and heap sizes. There must be a specific GC for a particular application. Also it is known that there is no gar-bage collector that gives best performance for all the applications, i.e. different applications have differ-ent requirements of GC (C.R. Attanasio, 2003) (R. Fitzgerald, 2001). (T. Brecht, 2006)] conclude from their experiments that the execution times of various applications they tested vary significantly with the scheduling algorithm used for garbage collection. They also showed that no single configuration of the BDW collector results in the fastest execution time for all applications. The scheduling algorithm which results in fastest execution of an application also varies with the amount of memory available in the machine.

1.1 Metrics that influence the performance of application are:To find the application-specific GC (S. Soman C. K., 2007) considered only two metrics like applica-tion execution time and throughput for measuring the performance of garbage collectors on different benchmarks. They find results for benchmark execution time and throughput using a wide-range of heap sizes. The techniques used by Jeremy Singer et al (J. Singer, 2007) achieved 5% speedup in overall execution time (averaged across all test programs for all heap sizes) as compared with select-ing the default GC algorithm in every trial. They considered only metric i.e. application execution time to select application-specific GC. But in some of the real world problems there are also others metrics. These metrics are also important and must be considered while selecting a GC for a specific application.

a.) Number of minor/major collections,b.) Average time taken by each collection,c.) Application execution time, d.) Throughput e.) Garbage collection time andf.) Memory reclaimed after each collection.

a) Number of minor/major collections – It is defined as the number of times the applica-tion becomes unresponsive because the GC is running. It is a major parameter which affects

3


the performance of the application. Ideally this number should be as small as possible so that application is stopped for smaller number of times. The CPU is intended for the execution of application. b) Average time taken by each collection – It is defined as the time interval during which GC is running. This time is proportional to the number of collections. If the number of collections are less then pause time is more and vice-versa. c) Application execution time – The time duration during which the application is running. It is defined as the difference between total time taken by the application and garbage collection time.d) Throughput – It is the percentage of total time not spent in garbage collection.e) Garbage Collection time – It is defined as the time spent in collecting the garbage.f) Memory reclaimed after each collection – It is defined as the memory which is reclaimed from dead objects after minor or major collection.

2. Related Works(S. Soman C. K., 2004), showed the overhead imposed by his system on application execution is 4% on an average. Earlier research (J. Singer, 2007) (S. Soman C. K., 2004) (S. Soman C. K., 2007) on automatic selection of GC considered only application execution time to find the shortest execu-tion time. This metrics is used to measure the performance of the application over different GC’s. Other important metrics (e.g. Number of pauses, Average Pause time, Memory Reclaimed after each collection) were not considered. The four GC systems that they consider in their work were Semispace (SS), Mark-sweep (MS), a Generational Semispace Hybrid (GSS), and a Generational Mark-sweep Hybrid (GMS). These systems halt the application when the garbage collector is run-ning and are executed in stop-the-world fashion. Their switching decision was based on the avail-able heap sizes (J. Singer, 2007) (S. Soman C. K., 2004). In our earlier research (N.S. Kotwal S. J., 2013) (N.S. Kotwal S. J., 2013) we performed experiments on four garbage collectors. These were Serial Collector, Parallel collector, ParallelOld collector, and Concmarksweep collector. In case of serial collector the application is halted while the collection is taking place and works on stop the world fashion. In case of Parallel collector minor collections are performed in parallel with the execution of application but still in the stop the world fashion but major collections per-formed serially. In case of ParallelOld collector both minor as well as major collections are done in parallel with compaction but still in stop the world fashion. In concmarksweep collector minor collection are done serially in stop the world fashion with the application. But the major collection is performed concurrently with the execution of the application. The domain of the research of (S. Soman C. K., 2004) (S. Soman C. K., 2007) was for server systems where resources are suddenly constrained. Their main aim was to prevent OutOfMemory error while the operating system is reclaiming the memory. But our framework is designed mainly for client class. Their main task was to reduce the application execution time in the server systems. The metrics such as number of pauses (minor/major), average pause time for each collection, memory reclaimed after each collec-tion, were not important in server systems. But in systems with client class these metrics are more important than execution time of mutator. Because real-time and interactive applications require prompt response and in that case the application execution time is not important. The pause time in these types of applications should be negligible. Their future work was to consider the metrics like frequency of collections, allocation rates, memory behavior to guide selection of collection and

4


allocation algorithms. These metrics are included in our research. We included all the important metrics in our study. (J. Singer, 2007), incorporated machine learning approach that takes a descrip-tion of application execution and predicts which GC algorithm performs best for a given JVM heap size without running all six GC algorithms on that application. (S. Soman C. K., 2004) (S. Soman C. K., 2007), showed that performance of an application is dependent on the application behavior and available resources. They also showed that no GC performs better for all applications and heap sizes. There must be a specific GC for a particular application. They developed a framework that can automatically switch between GCs without restarting and possibly rebuilding the execution environment. Their system can switch between different collection strategies during the execution of the program. (S. M. Blackburn, 2004) ,experimental design shows key algorithmic features and how they match program characteristics to explain the direct and indirect costs of garbage collec-tion as a function of heap size on the SPEC JVM benchmarks. They find that the contiguous alloca-tion of copying collectors attains significant locality benefits over free-list allocators. The reduced collection cost of the generational algorithms together with the locality benefit of contiguous allo-cation motivates a copying nursery for newly allocated objects. The above mentioned advantages dominate the overheads of generational collectors compared with non-generational. (K. Barabash, 2003), improved the throughput, stack, and behavior of cache without compromising short pauses and high scalability.

3. Objectives Of Current Research The objective of the current research paper is to develop solution for intelligent selection and acti-vation of GC with appropriate environment by finding the optimal performance for each benchmark of SPECjvm2008.

4. Experimentation4.1 Performance of GC in real JvmThe various benchmarks used in this current research are shown in Table 1. We fixed the heap size by setting parameters –Xms and –Xmx to the same value. By fixing the heap size we ensure that the benchmark will be within that size. We know that different GC’s perform relatively different over different heap size. To get large amount of training data we executed each benchmark over different heap sizes starting from 20 mb to 400 mb with an increment of 20mb. We also found that various benchmarks were not able run with small heap size. To get the approximate value the benchmark is executed 10 times over each fixed size for all the collectors and arithmetic mean is taken. All the tests were carried out in fixed environment with client vm.

4.2 Analysis and ResultsAfter performing the tests we get the results for the various parameters discussed in section 1.3, which affect the performance of an application. The previous study was based on the application execution time or throughput (J. Singer, 2007) (S. Soman C. K., 2004), they ignored the other parameters which were equally important to find the appropriate collector for the specific bench-mark. If application execution time of an application is less but the number of pauses or average pause time for each collection is significant. Then this type of scenario is not good for interactive

5


applications or those applications that need client side processing. Also if the memory reclaimed after each collection is not significant then these collections will occur more frequently. So in order to select the appropriate GC for specific application needs profiling of all the parameters for that application. In our previous findings (N.S. Kotwal S. J., 2013) (N.S. Kotwal S. J., 2013) we have obtained the values for all the parameters. By analysing these values we find the appropriate param-eters. These parameters should be used for intelligent selection and activation of a specific GC for a specific benchmark to obtain the optimal performance. By analysing the data in previous research (N.S. Kotwal S. J., 2013) (N.S. Kotwal S. J., 2013) of ours we found the following results.The startup benchmark should be executed with ParallelOld GC. It is initialized at 20 MB heap size. The heap size where startup gives optimal performance for all the parameters is 220 mb. The compiler benchmark should be executed with Serial collector with 280 mb heap size. The compress benchmark gives optimal performance at 280 mb with concmarksweep collector. Crypto benchmark should be executed with parallelOld GC with 240 mb heap size to achieve the optimal performance. Derby benchmark gives optimal performance at 400 mb when executed with paral-lel collector. ParallelOld collector is appropriate to execute mpegaudio benchmark at 360 mb to achieve the optimal performance. The heap size where scimark.large gives optimal performance for all the parameters is 380 mb and it should be executed with concmarksweep collector. Scimark.small benchmark should be executed with parallel collector at 320 mb heap size to achieve the

Benchmark Description

Startup This benchmark starts each benchmark for one operation.Start up benchmark is single-threaded.

Compiler This benchmark uses the OpenJDK (JDK 7 alpha) front end compiler to compile a set of .java files.

Compress This benchmark compresses data, using a modified Lempel-Ziv method (LZW).

Crypto This benchmark encrypt and decrypt data using AES (crypto.aes), RSA (crypto.rsa) and sign verification (crypto.signverify).

Derby The focus of this benchmark is on BigDecimal computations ,database logic, and on locks behavior.

MPEGaudio This benchmark is used for mp3 decoding

ScimarkThis benchmark is widely used by the industry as a floating point benchmark. There are two versions of this test, one with a “large” dataset (32Mbytes) and another with “small” dataset(512Kbytes).

Serial This benchmark serializes and deserializes primitives and objects, using data from the JBoss benchmark.

Sunflow This is a multi-threaded benchmark used in image rendering system.

XML This Benchmark apply style sheets to XML documents using javax.xml.transform, and validating XML documents by javax.xml.validation.

Table 1: Benchmarks of SPECjvm2008 and their description

6


optimal performance. The serial benchmark should be executed with parallel collector at 380 mb to achieve the optimal performance. Concmarksweep collector is suitable for sunflow benchmark and the heap size where it gives optimal performance is 380 mb. Xml benchmark should be executed at 400 mb with concmarksweep collector.

5. Proposed ModelConsidering all the parameters, we found a specific GC for each benchmark. The tradeoffs among parameters are noted for implementation at which GC gives optimal performance for benchmark. The benchmarks are categorized into different groups and the benchmarks which give optimal perfor-mance with Serial GC are put in one group; the benchmark which performs better with Parallel GC are put in the second group and so on. To avoid the complexity between benchmarks and GC’s we made four groups depending on the need of the collector of each benchmark. The model is shown in figure 1.

After categorizing the different benchmarks into four groups, we also measured the different sizes of heap at which the GC shall give the optimal performance. These findings were based on the tradeoff between differ-ent parameters to get the best optimal performance for each one. Intelligent selection and activation of the GC in the proposed model is explained in the following sections:

Phase I. Selection of Group The benchmarks of SPECjvm2008 are divided into four groups based on the different GC they require. Group one (G1) have four benchmarks. These are startup, crypto, mpegaudio, and scimark.small. Group two (G2) have one benchmark i.e. compiler. Compress, scimark.large, sunflow, xml are categorized into group three (G3). The fourth group (G4) has two benchmarks, i.e., derby and serial.

Phase II. Selection of processAfter the input of an application, the process of selection of a specific GC for it begins. The GC which is selected gives optimal performance for the different parameters. The process of select-ing the suitable GC is obtained through the analysis of the empirical testing at real JVM. After the analysis it is found that ParallelOld GC is best suited for all the benchmarks of G1, serial GC is suited for all the benchmarks of G2, concmarksweep GC is selected for all the benchmarks of G3 and parallel GC is selected for all the benchmarks of G4.

Phase III. Setting the sizeThen the next step is setting the size of the heap for execution of each GC for a particular bench-mark. The optimal size of the heap is found after the empirical analysis. Each GC performs best at a suitable size of the heap. The optimal size of the heap for executing startup benchmark is 220 mb and for crypto benchmark 240 mb is the fixed size for optimal performance. The heap size obtained after analysis is for compiler and compress is 280 MB. Optimal size of the heap for Scimark.small is 320 MB. The initial and final heap size for Mpegaudio should be fixed at 360 MB. It is found that scimark.large, serial, and sunflow benchmarks need larger heap size of 380 MB for obtaining the optimal performance. The initial and final heap size of derby and xml benchmark should be fixed at 400 mb.

Phase IV. Invocation of GCThe selection is based on the results we have obtained by performing the tests on all the bench-marks. Last step in the proposed model of intelligent section and activation of the GC is invocation

7


Input benchmark(x)G1=startup, crypto, mpegaudio, scimark.smallG2 = compilerG3 =compress, scimark.large, sunflow, xmlG4=derby, serial

Selection of Group

Figure 1 Intelligent Garbage Collector Selection

Set –Xmx= -Xmx=220MB







-XX:+UseSerialGC -XX:+UseConcMarkSweepGC -XX:+UseParallelGC

Invocation of Garbage Collector by JVM

-XX:+UseParallelOldGC

If x==G1? If x==G2? If x==G3? If x==G4?

If x==startup?

If x== crypto?

If x==mpegaudio?

If x==scimark.

large?

If ==compiler? If ==compress?

If x==scimark.

large

If x==sunflow?

If x==xml?

If x==derby?

If x== serial?

Tabl

e 2:

Ave

rage

of a

ll be

nchm

arks

of S

PEC

jvm

2008

with

def

ault

garb

age

colle

ctor

for v

ario

us m

etric

s.

Tabl

e 3:

Ave

rage

of a

ll be

nchm

arks

of S

PEC

jvm

2008

with

spec

ific

garb

age

colle

ctor

and

hea

p si

ze fo

r var

ious

met

rics

9


of GC for the specific benchmark. Each GC has different values for different parameters for each benchmark for obtaining the best performance. The real JVM is activated with those suitable parameters.

6. Implementation and TestingWe have performed tests on java version jdk1.7.0_04, Ergonomics machine class is client. JVM name is Java Hotspot(TM) Client VM. Benchmarks specified in the SPECjvm2008 are executed over a wide range of heap size varying from 20 mb to 400 mb with an increment of 20 mb size. Each of the benchmark is executed 10 times in a fixed heap size and the arithmetic mean is obtained. These tests are performed on all the four GC’s. The arithmetic mean of all the values obtained at each heap size for all benchmarks of the default collector is obtained. To compare the proposed model with the existing approach of garbage collection, we have calculated the arithmetic mean of all the benchmarks with a specified heap size and GC.

7. Conclusions The results are summarized in table 2 and table 3. By comparing the proposed model with existing approach we obtained the following results. The overall application execution time is reduced by 7% and throughput is improved by 2%. Total garbage collection time for all the benchmarks is also reduced by 39%. Memory reclaimed after minor collection is increased by 12%. While memory reclaimed after major collection is reduced by 50%. Number of times the application is paused due to minor collection is reduced by 43% while average time to collect the young objects is increased by 13%(it does not causes much effect because it is a very small fraction of a second). The number of times application halted by full collection is reduced by 89% and also the average time to collect the old objects is reduced by 45%. Earlier machine learning techniques (J. Singer, 2007) uses at least single profiling of the benchmark to find the dynamic features of the benchmark. In future we will improve this model by incorporating machine learning techniques to predict the appropriate GC. We will use the static features of the benchmarks for machine learning. Combination of col-lectors shall also be tried for optimal results. Other options specified for tuning GC’s on JVM shall also be tried for improvement of GC’s.

References

[1] C.R. Attanasio, D. F. (2003). A Comparative Evaluation of Parallel Garbage Collectors. In Proc. of the 14th Annual Workshop on Languages and Compilers for Parallel Computing (pp. 177-192). Berlin: Springer-Verlag.

[2] J. Singer, G. B. (2007). Intelligent Selection of Application-Specific Garbage Collectors. In Proc. of the 6th Int. Symposium on Memory Management (pp. 91–102). New York: ACM.

[3] K. Barabash, Y. O. (2003). Mostly Concurrent Garbage Collection Revisited. OOPSLA ‘03 Proc. of the 18th Annual ACM SIGPLAN Conf. on Object-Oriented Prog., Languages, and App. (pp. 255–268). New York: ACM Press.

[4] N.S. Kotwal, S. J. (2013, June). Memory Reclamation by Garbage Collectors: SPECjvm2008. Int. J. of Adv. Research in Comp. Science, 4(8), 333–337.

[5] N.S. Kotwal, S. J. (2013, August). Selection of Application Specific Garbage Collector in Client Class. Int. J. of Adv. Research in Comp. Science and Software Engineering, 3(8), 1319–1330.

[6] R. Fitzgerald, D. T. (2001). The Case for Profile-directed Selection of Garbage Collectors. In Proc. of the 2nd Int. Symposium on Memory Management. 36, pp. 111-120. New York: ACM SIGPLAN Notices.

10


[7] S. M. Blackburn, P. C. (2004). Myths and Realities: The Performance Impact of Garbage Collection. Proc. of the Joint Int. Conf. on Measurement and Modeling of Comp. Sys. (pp. 25-36). New York: ACM Press.

[8] S. Soman, C. K. (2004). Dynamic Selection of Application-Specific Garbage Collectors. In Proc. of the 4th Int. Symposium on Memory Management (pp. 49–60). New York: ACM.

[9] S. Soman, C. K. (2007). Application-specific Garbage Collection. J. of Sys. and Software, 80(7), 1037-1056.

[10] Sun Microsystem. (2006, April 1). Memory Management in the Java HotSpot Virtual Machine. Retrieved from Oracle: http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf

[11] T. Brecht, E. A. (2006, September). Controlling Garbage Collection and Heap Growth to Reduce the Execution Time of Java Applications. ACM Transactions on Programming Languages and Systems, 28(5), 908–941.

Selection, Initialization and Activation of Application Specific Garbage Collector: SpecJVM2008

Documents

Transcript of Selection, Initialization and Activation of Application Specific Garbage Collector: SpecJVM2008