Binary JSON with bson4jackson

Re­cently, JSON has be­come an ex­cel­lent al­tern­at­ive to XML. But most JSON pars­ers writ­ten in Java are still rather slow. On my search for faster lib­rar­ies I found two things: BSON and Jack­son.

BSON is bin­ary en­coded JSON. The format has been de­signed with fast ma­chine read­ab­il­ity in mind. BSON has gained prom­in­ence as the main data ex­change format for the doc­u­ment-ori­ented data­base man­age­ment sys­tem Mon­goDB. Ac­cord­ing to the JVM seri­al­izers bench­mark Jack­son is one of the fast­est JSON pro­cessors avail­able. Apart from that, Jack­son al­lows writ­ing cus­tom ex­ten­sions. This fea­ture can be used to add fur­ther data ex­change formats.

bson4jackson

This is the mo­ment where bson4­jack­son steps in. The lib­rary ex­tends Jack­son by the cap­ab­il­ity of read­ing and writ­ing BSON doc­u­ments. Since bson4­jack­son is fully in­teg­rated, you can use the very nice API of Jack­son to seri­al­ize simple PO­JOs. Think of the fol­low­ing class:

public class Person {
  private String _name;

  public void setName(String name) {
    _name = name;
  }

  public String getName() {
    return _name;
  }
}

You may use the ObjectMapper to quickly seri­al­ize ob­jects:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import com.fasterxml.jackson.databind.ObjectMapper;
import de.undercouch.bson4jackson.BsonFactory;

public class ObjectMapperSample {
  public static void main(String[] args) throws Exception {
    //create dummy POJO
    Person bob = new Person();
    bob.setName("Bob");

    //serialize data
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectMapper mapper = new ObjectMapper(new BsonFactory());
    mapper.writeValue(baos, bob);

    //deserialize data
    ByteArrayInputStream bais = new ByteArrayInputStream(
      baos.toByteArray());
    Person clone_of_bob = mapper.readValue(bais, Person.class);

    assert bob.getName().equals(clone_of_bob.getName());
  }
}

Or you may use Jack­son’s stream­ing API and seri­al­ize the ob­ject manu­ally:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import de.undercouch.bson4jackson.BsonFactory;

public class ManualSample {
  public static void main(String[] args) throws Exception {
    //create dummy POJO
    Person bob = new Person();
    bob.setName("Bob");

    //create factory
    BsonFactory factory = new BsonFactory();

    //serialize data
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    JsonGenerator gen = factory.createJsonGenerator(baos);
    gen.writeStartObject();
    gen.writeFieldName("name");
    gen.writeString(bob.getName());
    gen.close();

    //deserialize data
    ByteArrayInputStream bais = new ByteArrayInputStream(
      baos.toByteArray());
    JsonParser parser = factory.createJsonParser(bais);
    Person clone_of_bob = new Person();
    parser.nextToken();
    while (parser.nextToken() != JsonToken.END_OBJECT) {
      String fieldname = parser.getCurrentName();
      parser.nextToken();
      if ("name".equals(fieldname)) {
        clone_of_bob.setName(parser.getText());
      }
    }

    assert bob.getName().equals(clone_of_bob.getName());
  }
}

Optimized streaming

One dis­ad­vant­age of BSON is the fact that each doc­u­ment be­gins with a num­ber de­not­ing the doc­u­ment’s length. When cre­at­ing an ob­ject this length has to be known in ad­vance and bson4­jack­son is forced to buf­fer the whole doc­u­ment be­fore it can be writ­ten to the OutputStream. bson4­jack­son’s parser ig­nores this length field and so you may also leave it empty. There­fore, you have to cre­ate the BsonFactory as fol­lows:

BsonFactory fac = new BsonFactory();
fac.enable(BsonGenerator.Feature.ENABLE_STREAMING);

This trick can in­crease the seri­al­iz­a­tion per­form­ance for large doc­u­ments and re­duce the memory foot­print a lot. The of­fi­cial Mon­goDB Java driver also ig­nores the length field. So, you may also use this op­tim­iz­a­tion if your bson4­jack­son-cre­ated doc­u­ments shall be read by the Mon­goDB driver.

Performance

Ver­sion 1.1.0 of bson4­jack­son in­tro­duced sup­port for Jack­son 1.7 as well as a lot of per­form­ance im­prove­ments. At the mo­ment, bson4­jack­son is much faster than the of­fi­cial Mon­goDB driver for Java (as of Janu­ary 2011). For seri­al­iz­a­tion, this is only true us­ing the stream­ing API, since Jack­son’s ObjectMapper adds a little bit of over­head (ac­tu­ally the Mon­goDB driver also uses some kind of a stream­ing API). Deseri­al­iz­a­tion is al­ways faster. The latest bench­mark res­ults can be re­viewed on the fol­low­ing web­site:

https://github.com/eishay/jvm-serializers/wiki

Compatibility with MongoDB

In ver­sion 1.2.0 bson4­jack­son’s com­pat­ib­il­ity with Mon­goDB has been im­proved a lot. Thanks to the con­tri­bu­tion by James Roper the BsonParser class now sup­ports the new HONOR_DOCUMENT_LENGTH fea­ture which makes the parser honor the first 4 bytes of a doc­u­ment which usu­ally con­tain the doc­u­ment’s size. Of course, this only works if BsonGenerator.Feature.ENABLE_STREAMING has not been en­abled dur­ing doc­u­ment gen­er­a­tion.

This fea­ture can be use­ful for read­ing con­sec­ut­ive doc­u­ments from an in­put stream pro­duced by Mon­goDB. You can en­able it as fol­lows:

BsonFactory fac = new BsonFactory();
fac.enable(BsonParser.Feature.HONOR_DOCUMENT_LENGTH);
BsonParser parser = (BsonParser)fac.createJsonParser(...);

Compatibility with Jackson

bson4­jack­son 2.x is com­pat­ible to Jack­son 2.x and higher. Due to some com­pat­ib­il­ity is­sues both lib­rar­ies’ ma­jor and minor ver­sion num­bers have to match. That means you have to use at least bson4­jack­son 2.1 if you use Jack­son 2.1, bson4­jack­son 2.2 if you use Jack­son 2.2, etc. I will try to keep bson4­jack­son up to date. If there is a com­pat­ib­il­ity is­sue I will up­date bson4­jackon, usu­ally within a couple of days after the new Jack­son ver­sion has been re­leased.

Here’s the com­pat­ib­il­ity mat­rix for the cur­rent lib­rary ver­sions:

Jack­son 2.7.xJack­son 2.6.xJack­son 2.5.x
bson4­jack­son 2.7.xYesYesYes
bson4­jack­son 2.6.xNoYesYes
bson4­jack­son 2.5.xNoNoYes

If you’re look­ing for a ver­sion com­pat­ible to Jack­son 1.x, please use bson4­jack­son 1.3. It’s the last ver­sion for the 1.x branch. bson4­jack­son 1.3 is com­pat­ible to Jack­son 1.7 up to 1.9.

Download

Pre-compiled binaries

Pre-com­piled bin­ary files of bson4­jack­son can be down­loaded from Maven Cent­ral. Ad­di­tion­ally, you will need a copy of Jack­son to start right away.

Maven/​Gradle/​buildr/​sbt

Al­tern­at­ively, you may also use Maven to down­load bson4­jack­son:

<dependencies>
  <dependency>
    <groupId>de.undercouch</groupId>
    <artifactId>bson4jackson</artifactId>
    <version>2.7.0</version>
  </dependency>
</dependencies>

For Gradle you may use the fol­low­ing snip­pet:

compile 'de.undercouch:bson4jackson:2.7.0'

For buildr use the fol­low­ing snip­pet:

compile.with 'de.undercouch:bson4jackson:jar:2.7.0'

If you’re us­ing sbt, you may add the fol­low­ing line to your pro­ject:

val bson4jackson = "de.undercouch" % "bson4jackson" % "2.7.0"

License

bson4­jack­son is li­censed un­der the Apache Li­cense, Ver­sion 2.0.

Un­less re­quired by ap­plic­able law or agreed to in writ­ing, soft­ware dis­trib­uted un­der the Li­cense is dis­trib­uted on an “AS IS” BASIS, WITHOUT WAR­RANTIES OR CON­DI­TIONS OF ANY KIND, either ex­press or im­plied. See the Li­cense for the spe­cific lan­guage gov­ern­ing per­mis­sions and lim­it­a­tions un­der the Li­cense.


Posted by Michel Krämer
on January, 30th 2011.