Sitemap

Implementing JDK String.repeat in Java 11

6 min readSep 24, 2018

Since Java 11. There is a new method in the String class called repeat. Yes, it repeats string, fast. Now guess. How many implementation lines are inside? One-liner? Stream-fancy one-liner? 5–7 lines? 9? 12? More. A lot more. In this article, I want to explore why. Are you ready?

Start like a pro

We are dedicating the first lines to check the incoming parameters. Uncivilized barbarians can provide a negative count. We need to protect the kingdom and raise the alarm. Zero-length input string or no-repeat means no work, so chill and return an empty string. When repeat count equals one then we are returning a reference to self. We are not harming the code since String is immutable.

if (count < 0) {
throw new IllegalArgumentException("count is negative: "+count);
}
if (count == 1) {
return this;
}
final int len = value.length;
if (len == 0 || count == 0) {
return "";
}

Performance achievement number one, single letter optimization

Now let’s create subtitles generator. Imagine a scene where people are falling into the hole. We need to generate “aaaaaaaaaa” with length proportional to the hole depth. Experts in the field would suggest appending the same letter to String many times, right? It is wrong. This is the perfect case for the Array class!

if (len == 1) {
final byte[] single = new byte[count];
Arrays.fill(single, value[0]);
return new String(single, coder);
}

You might not believe but Arrays.fill is filling the array. What a twist, right? The method inside might be also complex, but it’s not. In fact, it is a simple loop. The advantage is a fact that memory is already allocated. We don’t need to twitch memory with a concatenation of many blocks.

for (int i = 0, len = a.length; i < len; i++)
a[i] = val;

Too bad. We cannot return a pointer to that array so we need to convert it back to the String — I’m looking at you C++. Hey, but why do we need a second argument here, the coder?

new String(single, coder)

String class is bodyguarding memory

Memory is cheap, inexpensive and fast. Unless is expensive since we need a lot of it. Spending $500 is better than spending $1000, right? That is the reason behind the coder. When possible we are using LATIN1 encoding, so characters will eat only 8 bytes. If some characters are out of scope then we fallback to UTF16. We can observe the calculation for an unknown character in String.valueOf method:

public static String valueOf(char c) {
if (COMPACT_STRINGS && StringLatin1.canEncode(c)) {
return new String(StringLatin1.toBytes(c), LATIN1);
}
return new String(StringUTF16.toBytes(c), UTF16);
}

In the repeat method, we are passing the value from the original String to avoid recalculation. When we know the characters, there is no need to check. We are not going to add any new character codes by repeating them. No need to work!

You shall not pass

Again, taking care of memory is a priority for such critical classes like String. Repeat process is going to produce out memory errors in some cases no matter what. Can we predict that? Can we avoid wasting time on the operation that will fail anyway? Magic ball or tarot prediction system might work here but they are not accurate. Instead, we can calculate the predicted length based on numbers.

if (Integer.MAX_VALUE / count < len) {
throw new OutOfMemoryError("Repeating " + len + " bytes String " + count + " times will produce a String exceeding maximum size.");

The Integer.MAX_VALUE is 0x7fffffff, which is ²³¹-1. If produced String will be larger than ~2GB then we can crash immediately.

Ladies and gentlemen, our anticipated star

Again, taking care of memory is a priority for such critical classes like String. In some cases, the repeat process is going to produce out memory errors no matter what. Can we predict that? So we don’t waste time on the operation that will fail anyway? Magic ball or tarot prediction system might work sometimes but it’s not accurate in all cases. Instead, we can calculate the predicted length based on numbers.

    final int limit = len * count;
final byte[] multiple = new byte[limit];
System.arraycopy(value, 0, multiple, 0, len);
int copied = len;
for (; copied < limit - copied; copied <<= 1) {
System.arraycopy(multiple, 0, multiple, copied, copied);
}
System.arraycopy(multiple, 0, multiple, copied, limit - copied);
return new String(multiple, coder);

Where is the actual implementation of String repeat? Before we jump to the actual code let’s do a short recap, what System.arraycopy is? It is one of the main building blocks of repeat code so we need to know. Again we have arrays operation. This time, we are copying over the fragment of Array from one place to another.

char[] a = "abcdefghi".toCharArray();
char[] b = "123456789".toCharArray();
System.arraycopy(a, 2, b, 4, 3);
System.out.println(a); // abcdefghi
System.out.println(b); // 1234cde89

Let’s look at the source code of System.arraycopy. It doesn’t provide much information since it is a Java native method. It means that JVM and OS can support it effectively at runtime.

/**
* Copies an array from the specified source array, beginning at the
* specified position, to the specified position of the destination array.
...
*
@param src the source array.
*
@param srcPos starting position in the source array.
*
@param dest the destination array.
*
@param destPos starting position in the destination data.
*
@param length the number of array elements to be copied.
...
*/
@HotSpotIntrinsicCandidate
public static native void arraycopy(Object src, int srcPos,
Object dest, int destPos,
int length);

The main code

First, we are making space for the output string:

final int limit = len * count;
final byte[] multiple = new byte[limit];

Then, the original string is transferred to the new array (the first occurrence):

System.arraycopy(value, 0, multiple, 0, len);

Then, we have a loop that is making 2 tricks.

int copied = len;
for (; copied < limit - copied; copied <<= 1) {
System.arraycopy(multiple, 0, multiple, copied, copied);
}

Only the target array is used and each loop is copying twice more on each iteration. After the first loop, we have two occurrences, then four, eight, sixteen, etc.

1: copycopy............................
2:
copycopycopycopy....................
3:
copycopycopycopycopycopycopycopy....
4: ?

Each iteration will be faster in copying since the copy operation will be optimized for the bigger chunks. Of course, at some point, we cannot duplicate the data since there is not enough space in the array left. The final step is to fill the remaining space and construct the result:

System.arraycopy(multiple, 0, multiple, copied, limit - copied);
return new String(multiple, coder);

In the end, we have our repeated string.

3: copycopycopycopycopycopycopycopy....
4: copycopycopycopycopycopycopycopycopy

Just before we finish

Let’s look at the entire method source code.

/**
* Returns a string whose value is the concatenation of this
* string repeated {
@code count} times.
* <p>
* If this string is empty or count is zero then the empty
* string is returned.
*
*
@param count number of times to repeat
*
*
@return A string composed of this string repeated
* {
@code count} times or the empty string if this
* string is empty or count is zero
*
*
@throws IllegalArgumentException if the {@code count} is
* negative.
*
*
@since 11
*/
public String repeat(int count) {
if (count < 0) {
throw new IllegalArgumentException("count is negative: " + count);
}
if (count == 1) {
return this;
}
final int len = value.length;
if (len == 0 || count == 0) {
return "";
}
if (len == 1) {
final byte[] single = new byte[count];
Arrays.fill(single, value[0]);
return new String(single, coder);
}
if (Integer.MAX_VALUE / count < len) {
throw new OutOfMemoryError("Repeating " + len + " bytes String " + count +
" times will produce a String exceeding maximum size.");
}
final int limit = len * count;
final byte[] multiple = new byte[limit];
System.arraycopy(value, 0, multiple, 0, len);
int copied = len;
for (; copied < limit - copied; copied <<= 1) {
System.arraycopy(multiple, 0, multiple, copied, copied);
}
System.arraycopy(multiple, 0, multiple, copied, limit - copied);
return new String(multiple, coder);
}

Do not do this at home!

Writing code like this in an average system is like using an F1 car to drive kids to school — unjustified madness. But, from the perspective of JDK developer, it is different. We can use the code to actually build parts of the F1 car.

--

--

Grzegorz Gajos
Grzegorz Gajos

Written by Grzegorz Gajos

Software Engineer, Co-founder of opentangerine.com. Owner of real, brutal, hands-on experience in diverse IT fields since 2009.

No responses yet