DUPLICATE STRINGS

Solution 1: String#intern()

One way to reduce memory consumption is to use String#intern() API. Example:

"Hello World".intern();

What does intern() API do? When intern() API is invoked on the string, JVM will check whether string already exists in the pool, if so that pre-existing string is returned. If string doesn't exist, then this new string is added to the pool and returned back. Basically, String#intern() maintains the cache of the strings so that you don't end up creating multiple instances of same string objects in your application. In this way, you can reduce duplication of strings in the application.

However, when you intern the strings, you should be cognizant of the following facts:

  • String#intern() can add a minor CPU & latency overhead. Amount of CPU & latency overhead may or may not be noticeable based on your application and amount of String objects that are interned.
  • If you are interning the strings make sure you are interning all the strings that you are comparing (i.e. String.equals() and String.compareTo()). Otherwise, it can lead to tricky bugs.
  • String.intern() returns the exact same String object for all equal strings. It means you need very cautious when using String objects for synchronization. In fact, it's highly recommended not to use String Objects for synchronization.
  • Until Java 6, internalized strings are stored in permanent generation. But starting from Java 7, internalized strings are stored in the same heap. So if you are running on Java version till 6, make sure to allocate sufficient Perm Gen space for your application.
  • String#intern() API also has a potential to increase the latency (i.e. response time) of the transaction, because String#intern() is invoked as part of the transaction by the application thread.

Solution 2: -XX:+UseStringDeduplication

If you are running on Java 8 update 20, you can consider using '-XX:+UseStringDeduplication' property. But note that this property can be used only if you are using G1 GC algorithm.

To enable this feature, during application startup, pass the below-mentioned properties:

-XX:+UseG1GC -XX:+UseStringDeduplication

When 'UseG1GC' property is passed, it activates G1 Garbage collector for your application. 'UseStringDeduplication' will activate String Deduplication process during garbage collection process. It would detect duplicate strings and remove them during garbage collection process.

This strategy has one major advantage over the String#intern() option. Duplication of strings can originate outside of your application source code as well. Example: 3rd party libraries, frameworks, and JDK. As you don't have control over the 3rd party source code, you wouldn't be able to invoke String#intern() API. On the other hand, 'UseStringDeduplication' operates at JVM level, thus it can clean-up the duplicate strings that originate from the 3rd party libraries, frameworks, and JDK.

However, please be advised, 'UseStringDeduplication' has a potential to increase the GC pause time, as they run through Garbage Collection phases (Full GC, Young GC & mixed GC phases).

Each application is unique, and we can't generalize the solution. So please benchmark either of the above solutions and choose the best one that works for you.


If you have any other solutions, feel free to email us at team@tier1app.com. We will be glad to add it here.