Serialization in web apps/JavaScript

Serialization is a problem that pops up in a persistent software. In web apps, it announces itself as a problem at requirements just beyond CRUD. You suddenly have all these data models that need to be represented in local memory at runtime and also need to be saved persistently in some form.

JSON

Serialization is actually a laborious problem that touches pretty much your entire codebase.

You might think "but JavaScript objects are basically JSON, right?" Well, yes, they are. "And JSON is just a serialization format, right?" Well, yes... but it's easily possible to overestimate how much it does for you (which is fine, and is not a valid criticism of JSON).

The limitations are encountered very early on: JSON won't help you deserialize a date, and although it will help you serialize var steve = new Person('Steve') into {'name': 'Steve'}, it won't help deserialize that into something that fulfils person.getName() => 'Steve'.

Types

The way I've handled this in the past is to have a central serialization system which handles wrapping things up into JSON and unwrapping it into your application's data structures, by augmenting JSON representations with type data, and delegating down to data-structure specific implementations of serialize and deserialize.

I'm not talking about the JSON.stringify() call here, which is trivial, I'm talking about representing my model in a way that is JSON-compatible and also includes all information necessary to deserialize automatically.

For example, JSON has no date type, so we have to represent a date as a JSON primitive and do some magic either side of the serialization to convert it to a primitive and to turn it back into a JavaScript date. A serializer for date might just convert the date into a time string, like:

123
function serializeDate(date) {
    return date.toString();
}

This will be invoked as part of a larger routine, which uses the above function to populate the 'value' field, and itself populates the $type field:

1234
{
    "$type": "date", 
    "value": "Mon Apr 28 2014 01:00:00 GMT+0100 (BST)"
}

A deserializer for this might look like:

123
function deserializeDate(value, cb) {
    cb(null, new Date(value));
}

which again exists as a part of a larger routine which analyses the $type field and finds the deserializeDate to pass the value field into.

Deserializers are best made asynchronous by default because sooner or later you'll want to represent references to objects that are stored remotely, which require a HTTP request to retrieve, and it's hard to go from synchronous to asynchronous.

deserialize()

To tie everything together, you have a top level deserialize(data, cb) function. It's useful if this function is fairly intelligent and handles anything you throw at it. For convenience, I've added the ability to recursively deserialize arbitrary JS maps, because later on it's useful to just throw a block of data at this function and get a block back, instead of having to queue up a lot of calls. I have not implemented the same case for Array in an attempt to keep the code brief, but you should consider doing so.

In the code below I've used Async and Underscore.

1234567891011121314151617181920212223242526272829303132333435363738394041
var typeMap = {
    'date' : deserializeDate
};

function deserialize(data, cb) {

    // JSON primitives are handled easily
    var primitives = {
        'string' : true,
        'boolean': true,
        'number': true
    }
    if (primitives[typeof data] || data == null) {
        // This is a primitive - we can just return it.
        cb(null, data);
        return;
    }

    var typeField = data['$type'];
    var deserializer = typeField && typeMap[typeField];
    if (typeof deserializer === 'function') { 
        // This is an object conforming to our ($type, value) structure.
        deserializer(data['value'], cb);
        return;
    }

    // Handle arbitrary JS objects by recursively deserializing its contents.
    else if (data.constructor.name === ({}).constructor.name) {
        var ret = {};
        async.each(_.keys(data), function(key, cb) {
            deserialize(data[key], function(err, value) {
                if (err)  { log('Some error deserializing ', data[key]); }
                else { ret[key] = value; }
                cb();
            });
        }, function(err) {
            cb(null, ret);
        });
        return;
    }
}

I've omitted the serialization code, but it's very much the same idea. You have a top level serializer that generates JSON-compatible objects by delegating to data-type specific serializers.

Deserializing your own data structures

That was pretty easy. The harder parts come when you consider your own objects.
Let's say you have a Person class. It's useful to have a way to merge serialized data into an existing Person (because this allows us to re-purpose our serialization code for handling live-update events), and it's also useful to have a way to create a new Person. Luckily, the second is just a trivial special case of the first.

123456789101112131415161718192021222324252627282930
Person.fromSerialized = function(obj, cb) {
    var p = new Person();
    p.mergeFromSerialized(obj, cb);
}

// Add this to the typeMap, so it becomes visible to deserialize()
typeMap['Person'] = Person.fromSerialized

Person.prototype.mergeFromSerialized = function(obj, cb) {

    // The correct way is to use deserialize(). This looks recursive, but 
    // the subtlety is that obj is not a typed object - it's just a plain 
    // JS map of properties. Because of this, the deserializer won't 
    // try to delegate and will just deserialize each member

    function take(object, key, defaultValue) {
        if (object.hasOwnProperty(key)) { 
            return object[key]; 
        }
        else { 
            return defaultValue;
        }
    }

    deserialize(obj, _.bind(function(data) {
        this.name = take(data, 'name', this.name);
        this.dob = take(data, 'dob', this.dob);
        cb(null, this);
    }, this));
}

take() is a helper function which returns the given key from an object unless that key doesn't exist, in which case it returns a default. It avoids a lot of if (data.hasOwnProperty()) {} blocks and makes the code a bit more legible.

What about references?

We still very quickly encounter yet another case: that objects need shared references. If a Person object has a friends array, the JSON doesn't want to embed the friends in that array, it wants to just store a reference. JSON has no reference support so you need to encode your own.

In this case you need a way to refer to the object itself, not its contents.

The way you handle this is to ID persistent objects and serialize them as special '$ref' objects, and expose a method on your server to return a specific object with the given ID.

1234567891011121314
{
    "$type": "Person",
    "value": {
         "name": "Steve",
         "friends": [
             { 
                 "$ref" : {
                     "$collection": "People",
                     "id": 3
                 }
             }
         ]
     }
}

This presents an interesting point: that you need two different serialized representations for objects that may exist in collections. One returns a normal keyed object representing the object's state, the other returns a $ref object. I've found the latter case to be the generally useful one, and the former to be a special case which the caller should only invoke purposefully. Meaning: serialize(myPerson) => {$ref: ... }, and myPerson.serialize() => {name: ..., dob: ..., friends: ... }.

The deserializer for a $ref looks something like:

12345678910111213141516171819202122
var collections = {};
function deserializeRef(data, cb) {
    
    var collectionName = data['$collection'],
        id = data['id'];
    if (!collections[collectionName]) {
        collections[collectionName] = {};
    }
    var collection = collections[collectionName];
    if (collection[id]) { cb(null, collection[id]); }
    else {
        yourServerApi.getCollectionElement(collectionName, id, 
                                           function(err, response) {
            if (err) { cb(err); return }
            deserialize(response, function(err, response) {
                if (err) { cb(err); return }
                collection[id] = response;
                cb(err, collection[id]);
            });
        });
    }
}

Minification concerns

The mergeFromSerialized code is still a bit irritating that we have to manually write out a line for each property.

It's tempting to rewrite it to something like this:

123456789
Person.prototype.mergeFromSerialized = function(obj, cb) {
    var myFields = ['name', 'dob']; 
    deserialize(obj, _.bind(function(data) {
        _.each(myFields, function(field) {
           this[field] = take(data, field, this[field]);
           cb(null, this);
        }, this);
    }, this));
}

Unfortunately there's a glaring problem with this code: You've just trashed static analysis. If you are using a compiler which aggressively renames class members, your deserialization will fail, because your code will write to this['dob'], which your compiler has renamed to this.a.

Using such an aggressive minification might not be of great importance for a smaller project, but if your compiled JS file is measuring several megabytes, it's useful to be able to be able to trim the source.

I am not really sure what the answer to this is, other than auto-generating the long form serialization source code.

Talk is cheap

Leave a comment:

HTML is not valid. Use:
[url=http://www.google.com]Google[/url] [b]bold[/b] [i]italics[/i] [u]underline[/u] [code]code[/code]