5 Things of Java - Serialization
5 Things of Java - Serialization
06 Apr 2010 Updated 26 Apr 2010 Java Object Serialization is so fundamental to Java programming that it's easy to take for granted. But, like many aspects of the Java platform, Serialization rewards those who go digging. In his first article of this new series, Ted Neward gives you five reasons to look twice at the Java Object Serialization API, including tricks (and code) for refactoring, encrypting, and validating serialized data. A few years ago, while working with a software team writing an application in the Java language, I experienced the benefit of knowing a little more than your average programmer about Java Object Serialization. About this series
So you think you know about Java programming? The fact is, most developers scratch the surface of the Java platform, learning just enough to get the job done. In this series, Ted Neward digs beneath the core functionality of the Java platform to uncover little known facts that could help you solve even the stickiest programming challenges.
A year or so prior, a developer responsible for managing the application's per user settings had decided to store them in a Hashtable, then serialize the Hashtable
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 1 of 12
developerWorks
ibm.com/developerWorks
down to disk for persistence. When a user changed his or her settings, the Hashtable was simply rewritten back to disk. This was an elegant and open-ended settings system, but it fell apart when the team decided to migrate from Hashtable to HashMap from the Java Collections library. The disk forms of Hashtable and HashMap are different and incompatible. Short of running some kind of data conversion utility over each of the persisted user settings (a monumental task), it seemed that Hashtable would be the application's storage format for the remainder of its lifetime. The team felt stuck, but only because they didn't know something crucial (and somewhat obscure) about Java Serialization: it was built to allow for evolution of types over time. Once I showed them how to do automatic serialization replacement, the transition to HashMap proceeded as planned. This article is the first in a series dedicated to uncovering useful trivia about the Java platform obscure stuff that comes in handy for solving Java programming challenges. Java Object Serialization is a great API to start with because it's been around since the beginning: JDK 1.1. The five things you'll learn about Serialization in this article should convince you to look twice at even standard Java APIs.
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 2 of 12
ibm.com/developerWorks
developerWorks
{ public Person(String fn, String ln, int a) { this.firstName = fn; this.lastName = ln; this.age = a; } public public public public public public public public String getFirstName() { return firstName; } String getLastName() { return lastName; } int getAge() { return age; } Person getSpouse() { return spouse; } void void void void setFirstName(String value) { firstName = value; } setLastName(String value) { lastName = value; } setAge(int value) { age = value; } setSpouse(Person value) { spouse = value; }
public String toString() { return "[Person: firstName=" + firstName + " lastName=" + lastName + " age=" + age + " spouse=" + spouse.getFirstName() + "]"; } private private private private } String firstName; String lastName; int age; Person spouse;
Once Person has been serialized, it's pretty simple to write an object graph to disk and read it back again, as demonstrated by this JUnit 4 unit test. Listing 2. Deserializing Person
public class SerTest { @Test public void serializeToDisk() { try { com.tedneward.Person ted = new com.tedneward.Person("Ted", "Neward", 39); com.tedneward.Person charl = new com.tedneward.Person("Charlotte", "Neward", 38); ted.setSpouse(charl); charl.setSpouse(ted); FileOutputStream fos = new FileOutputStream("tempdata.ser"); ObjectOutputStream oos = new ObjectOutputStream(fos); oos.writeObject(ted); oos.close(); } catch (Exception ex) { fail("Exception thrown during test: " + ex.toString()); } try { FileInputStream fis = new FileInputStream("tempdata.ser"); ObjectInputStream ois = new ObjectInputStream(fis); com.tedneward.Person ted = (com.tedneward.Person) ois.readObject();
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 3 of 12
developerWorks
ibm.com/developerWorks
ois.close(); assertEquals(ted.getFirstName(), "Ted"); assertEquals(ted.getSpouse().getFirstName(), "Charlotte"); // Clean up the file new File("tempdata.ser").delete(); } catch (Exception ex) { fail("Exception thrown during test: " + ex.toString()); } } }
Nothing you've seen so far is new or exciting it's Serialization 101 but it's a good place to start. We'll use Person to discover five things you probably didn't already know about Java Object Serialization.
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 4 of 12
ibm.com/developerWorks
developerWorks
} public class Person implements java.io.Serializable { public Person(String fn, String ln, int a, Gender g) { this.firstName = fn; this.lastName = ln; this.age = a; this.gender = g; } public public public public public public public public public public String getFirstName() { return firstName; } String getLastName() { return lastName; } Gender getGender() { return gender; } int getAge() { return age; } Person getSpouse() { return spouse; } void void void void void setFirstName(String value) { firstName = value; } setLastName(String value) { lastName = value; } setGender(Gender value) { gender = value; } setAge(int value) { age = value; } setSpouse(Person value) { spouse = value; }
public String toString() { return "[Person: firstName=" + firstName + " lastName=" + lastName + " gender=" + gender + " age=" + age + " spouse=" + spouse.getFirstName() + "]"; } private private private private private } String firstName; String lastName; int age; Person spouse; Gender gender;
Serialization uses a calculated hash based on just about everything in a given source file method names, field names, field types, access modifiers, you name it and compares that hash value against the hash value in the serialized stream. To convince the Java runtime that the two types are in fact the same, the second and subsequent versions of Person must have the same serialization version hash (stored as the private static final serialVersionUID field) as the first one. What we need, therefore, is the serialVersionUID field, which is calculated by running the JDK serialver command against the original (or V1) version of the Person class. Once we have Person's serialVersionUID, not only can we create PersonV2 objects out of the original object's serialized data (where the new fields appear, they will default to whatever the default value is for a field, most often "null"), but the opposite is also true: we can deserialize original Person objects out of PersonV2 data, with no added fuss.
developerWorks
ibm.com/developerWorks
It often comes as an unpleasant surprise to Java developers that the Serialization binary format is fully documented and entirely reversible. In fact, just dumping the contents of the binary serialized stream to the console is sufficient to figure out what the class looks like and contains. This has some disturbing implications vis-a-vis security. When making remote method calls via RMI, for example, any private fields in the objects being sent across the wire appear in the socket stream as almost plain-text, which clearly violates even the simplest security concerns. Fortunately, Serialization gives us the ability to "hook" the serialization process and secure (or obscure) the field data both before serialization and after deserialization. We can do this by providing a writeObject method on a Serializable object. Obscuring serialized data Suppose the sensitive data in the Person class were the age field; after all, a lady never reveals her age and a gentleman never tells. We can obscure this data by rotating the bits once to the left before serialization, and then rotate them back after deserialization. (I'll leave it to you to develop a more secure algorithm, this one's just for example's sake.) To "hook" the serialization process, we'll implement a writeObject method on Person; and to "hook" the deserialization process, we'll implement a readObject method on the same class. It's important to get the details right on both of these if the access modifier, parameters, or name are at all different from what's shown in Listing 4, the code will silently fail, and our Person's age will be visible to anyone who looks. Listing 4. Obscuring serialized data
public class Person implements java.io.Serializable { public Person(String fn, String ln, int a) { this.firstName = fn; this.lastName = ln; this.age = a; } public public public public public public public public String getFirstName() { return firstName; } String getLastName() { return lastName; } int getAge() { return age; } Person getSpouse() { return spouse; } void void void void setFirstName(String value) { firstName = value; } setLastName(String value) { lastName = value; } setAge(int value) { age = value; } setSpouse(Person value) { spouse = value; }
private void writeObject(java.io.ObjectOutputStream stream) throws java.io.IOException { // "Encrypt"/obscure the sensitive data
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 6 of 12
ibm.com/developerWorks
developerWorks
age = age >> 2; stream.defaultWriteObject(); } private void readObject(java.io.ObjectInputStream stream) throws java.io.IOException, ClassNotFoundException { stream.defaultReadObject(); // "Decrypt"/de-obscure the sensitive data age = age << 2; } public String toString() { return "[Person: firstName=" + firstName + " lastName=" + lastName + " age=" + age + " spouse=" + (spouse!=null ? spouse.getFirstName() : "[null]") + "]"; } private private private private } String firstName; String lastName; int age; Person spouse;
If we need to see the obscured data, we can always just look at the serialized data stream/file. And, because the format is fully documented, it's possible to read the contents of the serialized stream without the class being available.
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 7 of 12
developerWorks
ibm.com/developerWorks
From time to time, a class contains a core element of data from which the rest of the class's fields can be derived or retrieved. In those cases, serializing the entirety of the object is unnecessary. You could mark the fields transient, but the class would still have to explicitly produce code to check whether a field was initialized every time a method accessed it. Given the principal concern is serialization, it's better to nominate a flyweight or proxy to go into the stream instead. Providing a writeReplace method on the original Person allows a different kind of object to be serialized in its place; similarly, if a readResolve method is found during deserialization, it is called to supply a replacement object back to the caller. Packing and unpacking the proxy Together, the writeReplace and readResolve methods enable a Person class to pack a PersonProxy with all of its data (or some core subset of it), put it into a stream and then unwind the packing later when it is deserialized. Listing 5. You complete me, I replace you
class PersonProxy implements java.io.Serializable { public PersonProxy(Person orig) { data = orig.getFirstName() + "," + orig.getLastName() + "," + orig.getAge(); if (orig.getSpouse() != null) { Person spouse = orig.getSpouse(); data = data + "," + spouse.getFirstName() + "," + spouse.getLastName() + "," + spouse.getAge(); } } public String data; private Object readResolve() throws java.io.ObjectStreamException { String[] pieces = data.split(","); Person result = new Person(pieces[0], pieces[1], Integer.parseInt(pieces[2])); if (pieces.length > 3) { result.setSpouse(new Person(pieces[3], pieces[4], Integer.parseInt (pieces[5]))); result.getSpouse().setSpouse(result); } return result; } } public class Person implements java.io.Serializable { public Person(String fn, String ln, int a) { this.firstName = fn; this.lastName = ln; this.age = a; }
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 8 of 12
ibm.com/developerWorks
developerWorks
String getFirstName() { return firstName; } String getLastName() { return lastName; } int getAge() { return age; } Person getSpouse() { return spouse; }
private Object writeReplace() throws java.io.ObjectStreamException { return new PersonProxy(this); } public public public public void void void void setFirstName(String value) { firstName = value; } setLastName(String value) { lastName = value; } setAge(int value) { age = value; } setSpouse(Person value) { spouse = value; }
public String toString() { return "[Person: firstName=" + firstName + " lastName=" + lastName + " age=" + age + " spouse=" + spouse.getFirstName() + "]"; } private private private private } String firstName; String lastName; int age; Person spouse;
Note that the PersonProxy has to track all of Person's data. Often this means the proxy will need to be an inner class of Person to have access to private fields. The Proxy will also sometimes need to track down other object references and serialize them manually, such as Person's spouse. This trick is one of the few that isn't required to be read/write balanced. For instance, a version of a class that's been refactored into a different type could provide a readResolve method to silently transition a serialized object over to a new type. Similarly, it could employ the writeReplace method to take old classes and serialize them into new versions.
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 9 of 12
developerWorks
ibm.com/developerWorks
In conclusion
Java Object Serialization is more flexible than most Java developers realize, giving us ample opportunity to hack out of sticky situations. Fortunately, coding gems like these are scattered all across the JVM. It's just a matter of knowing about them, and keeping them handy for when a brain-stumper presents itself. Next up in the series: Java Collections. Until then, have fun twisting Serialization to your evil will!
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 10 of 12
ibm.com/developerWorks
developerWorks
Downloads
Description
Sample code for this article Information about download methods
Name
5things1-src.zip
Size
10KB
Download method
HTTP
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 11 of 12
developerWorks
ibm.com/developerWorks
Resources
Learn Chat with Ted Neward: Ted Neward chats with dW on his new series and the Collections API. "Test object serialization" (Elliotte Rusty Harold, IBM developerWorks, June 2006): Learn why it's important to test the serialized forms of objects, then try out various ways to test object serialization. "Discover the secrets of the Java Serialization API" (Todd M. Greanier, JavaWorld, July 2000): An overview of the Java Serialization API, followed by three approaches to serializing Java objects. "The Java Serialization algorithm revealed" (Sathiskumar Palaniappan, JavaWorld, May 2009): A closer look at the mechanics of the Java Serialization algorithm. "Java Object Serialization: Download the Java Serialization spec as a PDF. The developerWorks Java technology zone: Hundreds of articles about every aspect of Java programming. Discuss Get involved in the My developerWorks community.
Trademarks
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
5 things you didn't know about ... Java Object Serialization Copyright IBM Corporation 2010. All rights reserved.
Trademarks Page 12 of 12