Comparison Between JSON and YAML For Data Serialization: Malin Eriksson Victor Hallberg
Comparison Between JSON and YAML For Data Serialization: Malin Eriksson Victor Hallberg
Comparison Between JSON and YAML For Data Serialization: Malin Eriksson Victor Hallberg
serialization
MALIN ERIKSSON
VICTOR HALLBERG
[email protected]
[email protected]
Abstract
This report determines and discusses the primary differences between two
different serialization formats, namely YAML and JSON. A general introduc-
tion to the concepts of serialization and parsing is provided first, which also
explains how they can be used to transfer and store data. This is followed by
an analysis of the YAML and JSON formats, where functionality, primary use
cases, and syntax is described. In addition to this the percieved performance
of implementations for both formats will also be investigated by conducting a
number of tests. Using the combined background information and results from
the tests, conclusions regarding the main differences between the two are then
determined and discussed.
Referat
Denna rapport tar upp och diskuterar primära skillnader mellan två olika
serialiseringsformat; YAML och JSON. Först ges en övergripande introduktion
till begreppen serialisering och parsing, som även förklarar hur de kan använ-
das för att överföra och lagra data. Därefter följer en mer djupgående analys
av YAML och JSON, där funktionalitet, primära användningsområden samt
syntax beskrivs. Utöver detta undersöks även prestandan hos implementatio-
ner av de olika formaten med hjälp av ett antal tester. Slutligen används den
samlade bakgrundsinformationen och resultaten från de genomförde testerna
för att påvisa de största skillnaderna mellan dem.
Contents
Preface 1
1 Introduction 2
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 General method . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.3 Scope of use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Serialization and parsing . . . . . . . . . . . . . . . . . . . . . 5
2.3 JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.4 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.5 Scope of use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.6 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 YAML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.4 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.5 Scope of use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.6 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Methods 13
3.1 Testing tools and environment . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Programming language . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 JSON implementation . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 YAML implementation . . . . . . . . . . . . . . . . . . . . . . 14
3.1.4 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Testing procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 Simple data set . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 Complex data set . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1 Serialization performance . . . . . . . . . . . . . . . . . . . . 15
3.4.2 Deserialization performance . . . . . . . . . . . . . . . . . . . 15
3.4.3 Serialization output size . . . . . . . . . . . . . . . . . . . . . 15
4 Results 16
4.1 General differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.2 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.4 Readability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.5 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.6 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.7 Scope of use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Serialization performance . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Deserialization performance . . . . . . . . . . . . . . . . . . . 20
4.2.3 Serialization output size . . . . . . . . . . . . . . . . . . . . . 20
5 Conclusions 21
5.1 Theoretical research . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Design goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.2 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.3 Similarity and evolution . . . . . . . . . . . . . . . . . . . . . 22
5.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Format of choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.2 Readability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
References 24
Preface
The work of this study was divided into different stages. It started with a pre-
study, which was conducted by both group members. Some theoretical research on
the subject was then collected, which Malin was primarily responsible of. This was
done in parallel with preparations for the performance testing being done by Victor.
Throughout the whole process, both of us have been writing different parts of the
report. In general, most of the technical issues have been Victors responsibilities,
whereas Malin has been doing background research, and compiling found facts.
Most sections has finally been revised by both members.
1
Chapter 1
Introduction
This paper discusses and compares different serialization formats in computer sci-
ence. In addition to a background of the serialization technique and how it is
utilized as a method today, it also compares two important light-weight data inter-
change formats, namely JSON and YAML. Although they are quite similar when
it comes to usability, there are some distinctions between them, mostly regarding
design choices and syntax. These choices do however affect their scope of use. The
report aims to discuss the various differences between the languages, along with the
resulting consequences regarding performance and usability in different use cases
which they lead to.
2
Chapter 2
Background
This chapter explains and defines background facts regarding the different parts of
this report. It includes information about the serialization and parsing process, but
focuses on the serialization languages to be compared - YAML and JSON.
2.1 Serialization
2.1.1 Definition
Serialization is a process for converting a data structure or object into a format that
can be transmitted through a wire, or stored somewhere for later use [8] .
In terms of serialization there are a legion of different ways and formats that can
be used. Which method and format to choose depends on the requirements set up on
the object or data, and the use for the serialization (sending or storing). The choice
may also affect the size of the serialized data as well as serialization/deserialization
performance in terms of processing time and memory usage.
3
CHAPTER 2. BACKGROUND 2.2. PARSING
in its serialized form. Rubys Marshal module returns a plain text string, which
however is not completely readable as it contains special byte sequences and is not
formatted in a way to be easily read by a human.
Example
A concrete example where serialization is needed is when storing information from
an address book, in this case written in Java. Every instance contains a person with
details about their address and phone number. One wants to store all instances on
a server in exactly the way they are created and there are a few possible solutions;
1. By using Java serialization, which is part of the language. This can easily be
done, but problems arise if the data would have to be accessible to applications
written in C++, Python or another language as the data is serialized in a way
unique to Java.
2. By using an improvised way of encoding the data into single strings, such as
encoding four integers into for example 12:3:-23:67. This solution requires
some custom parsing code to be written, and is most efficiently used when
converting very simple data.
3. By serializing the data into XML. It is an attractive method due to the fact
that XML is human readable and have bindings (API libraries) for many
languages, although it is space intensive and can cause performance penalties
on applications.
2.2 Parsing
2.2.1 General
The term parsing in computer science means in general to analyze written text,
determining its grammatical structure from a known formal grammar. In linguistic
terms, parse means analyzing and describe the grammar of a sentence. The parser
splits up an expression into tokens which are then inserted into some kind of data
4
CHAPTER 2. BACKGROUND 2.3. JSON
structure. This data is the evaluated to interpret the meaning of each expression
by the rules from given grammar, followed by execution of the appropriate action.
2.3 JSON
2.3.1 General
JSON is a subset of the open ECMAScript standard [3] (which the JavaScript pro-
gramming language is an implementation of). It was created to be used as a way
to parse human-readable (in plain text format) representations of data into valid
ECMAScript objects [7] . It is completely language independent and uses notations
similar to common programming languages such as C, C++, Java, etc.
The format has grown to be very popular in cases where serialization and inter-
change of structured data over networks [13] and is often associated with the modern
web due to the fact that it is frequently used when communication between a web
server and client side web application is requested.
2.3.2 Origin
JSON was originally introduced as a written specification by Douglas Crockford in
2001 [4] , who used the format within his company State Software. Crockford was not
the first person to invent the object notation as other individuals had discovered it
independently at about the same time, but he was the first one to give it a complete
specification based on parts of the JavaScript standard. Following that he launched
the JSON.org website in 2002, which still exists and currently provides a listing of
JSON libraries for different programming languages [3] . It quickly grew in popularity
partly thanks to its simplicity, which made it much more light weight (resulting in
faster load times over the Internet) compared to XML, a format frequently used on
the web. The other reason for the growth in usage is the increased use of JavaScript
on the web.
5
CHAPTER 2. BACKGROUND 2.3. JSON
JSON documents can be parsed in JavaScript by calling the built-in eval func-
tion with the JSON string provided as an argument. The JavaScript interpreter
will then execute the parameter as JavaScript code, constructing an object with
the properties defined by the JSON string. This will work due to the fact that
JavaScript is a superset of JSON. Using the eval function is theoretically the most
efficient way to parse JSON as it will just invoke the JavaScript interpreter (with-
out any security/constraint cheks). This method can be said to be quite inelegant
since the interpreter does not prevent any JavaScript code from being executed. In
most cases a dedicated JSON parser should be used to avoid security issues and
only allow valid JSON as input. Most of the modern browsers have had fast native
JSON parsers since 2009, which are preferred to using eval [13] .
2.3.3 Functionality
JSON is human-readable language, foremost designed for its simplicity and uni-
versality. It implements basic data types available to most modern programming
languages [6] . The fact that it also is easy to read and parse contributes to its
usefulness in programming. JSON also is language-independent, meaning that the
specification is not tied to any specific programming language (it was originally
based on the JavaScript object notation however). The design incorporates data
types common across most modern languages.
The JSON standard does not support object references, which affects the ability
to store cyclic structures for example. This functionality can be provided by an
extension like dojox.json.ref from the Dojo Toolkit [19] , enabling JSON objects to
be marked with specific ids which can later be referenced to.
Complex structures can also be built as associative arrays, objects within objects.
JSON objects can contain any valid data type, enabling deep data hierarchies in
JSON documents.
The JSON format specification does not include support for validations of values
or structure, but a external specification called JSON Schema exists as a draft [20] .
JSON Schema can be used to define the structure of a JSON document much like
an XML Schema, for example which data types values should have and if they are
optional or required to be present. The defined schema can then be used to validate
JSON documents or as a way to document application APIs.
• Boolean (true/false)
6
CHAPTER 2. BACKGROUND 2.3. JSON
• Null
2.3.4 Syntax
As is described above, JSON consists of objects, arrays and scalars. General syntax
will be described in this section, which is intended to give an overview over the
language and its usability regarding semantics. An example of a arbitrary JSON
document can be found in table 2.1 (with code converted from the YAML exam-
ple [6] ).
• Comments are not allowed in the current standard (they were removed by the
author in a later revision of the specification [4] ).
• Identifiers must be enclosed in quotes (as a string) and are followed by a colon
and value.
• Arrays (ordered set of values) are placed within brackets ([]) and separated
by commas.
2.3.6 Process
JSON is parsed (deserialized) in a simple character by character reading, construct-
ing structures and object in one single pass. JavaScript implementations allows a
parameter for an external function (called a reviver) to be provided, allowing more
specific transformation of data. Serialization is also done in one single iteration
7
CHAPTER 2. BACKGROUND 2.4. YAML
through the data structure, where most implementations call a to_json (or simi-
larly named) method, either earlier defined by the implementation or by the user,
and then appends the result of this method call to the JSON output.
2.4 YAML
2.4.1 General
YAML is a recursive acronym for YAML Ain’t Markup Language, emphasizing on
it’s design as a data storage format. It is a light-weight human readable serializing
language primarily designed to be easy to read and edit. By adding a simple typing
system and aliasing mechanism upon the three most common data structures used
when serializing (hashes, arrays and strings) it forms a language which is easy to
use, while still including more complex features [6] .
2.4.2 Origin
YAML was first proposed by Clark Evans in 2001, who then designed it together
with Ingy döt Net and Oren Ben Kiki [6] . The format was developed from experience
8
CHAPTER 2. BACKGROUND 2.4. YAML
and discussions among sml-dev members on the Internet, and is still updated based
on user input from the YAML-core mailing list [6] .
2.4.3 Functionality
Due to the fact that YAML needs to be easily extensible and readable by humans
it is mainly integrated and built upon concepts described by C, Java, Perl, Python
and Ruby. The main design goals are to mimic native data structures of modern
languages as much as possible, support parsing and have a consistent model to
support generic tools. This can lead to some complications when generating and
parsing YAML documents [6] .
YAML can be considered to be a superset of JSON, providing syntax for im-
proved human readability along with a more complete information model (support
for additional types) [6] . JSON files are often valid YAML files because of the fact
that JSON’s semantic structure is equivalent to YAML’s in line writing style (which
was added in the new v1.2 specification of YAML). This means that YAML parsers
adhering to the new specification also should be able to parse most JSON files.
In addition to the basic data types available in JSON, YAML also supports
relational trees. Relational trees are a language construct with which references
to other nodes in the YAML document can be made [6] . A node in the YAML
document tree can be defined as an anchor, and later references to that anchor will
then include the data of the anchored node into the node. Smart use of this feature
can lead to increased readability, compactness and clarity along with less chance of
data entry errors.
The YAML specification also allows user-defined data types to be declared, as
well as explicit data typing. This is especially useful for serialization purposes,
allowing a parser to automatically construct an object of the correct class when
deserializing, instead of an generic collection.
YAML structures includes nodes and tags. A node represents a single native
data structure, which can be a scalar, sequence or mapping. Each node can be
marked with a tag, which restricts the set of possible values upon that node. A tag
works as a identifier for data structures. YAML defines two different types of tags.
Local tags are specific to a single application and start with an exclamation mark.
Global tags are global across all applications and are defined as URIs [22] . Tags are
primarily used to associate meta data to nodes, for example by telling the YAML
parser what kind of user defined object a node represents (which the parser then
may deserialize the data into).
• Boolean (true/false)
9
CHAPTER 2. BACKGROUND 2.4. YAML
• Null
2.4.4 Syntax
Describes YAML language and syntax, and gives an overview over the language.
An example of a arbitrary YAML document can be found in table 2.2 [6] .
General:
• Comments begin with a hash/number sign (#) and continues to the end of the
current line.
• One mapping per line, marked with an identifier followed by a colon and space
(key: value).
• An inline format which mimics the JSON object notation is also available.
Associative arrays are in this case enclosed in braces with items being comma
separated).
Sequences (arrays):
• An alternative inline syntax exists, where the list is enclosed in brackets and
items are separated by a comma followed by space.
Structures:
• Three repeated dashes denote the start of a document, and is also used to
separate multiple documents in a single transmission.
• Ending a transmission along with the current document is done with three
repeated dots.
• Repeating nodes are defined with an ampersand and later referenced with an
asterisk, where character is followed by an identifier.
10
CHAPTER 2. BACKGROUND 2.4. YAML
• A question mark and space in the beginning of a line denotes sets which are
unordered.
• Explicit data type casting is done by prefixing the value in the same way as
with user defined types but with an additional exclamation mark.
Strings:
• Quoting is often not required but can be, using either single or double quotes.
• The single quoted style is useful when no escaping is needed, while the dou-
ble quoted style allows for escape sequences. It can span multiple lines and
newlines are folded and included by a newline escape character (\\n)
• Strings can be written using either the standard inline style (with or without
quotes) or with block notation where a initial symbol determines how newlines
in the document should be handled.
• Strings can be written using either the standard inline style (with or without
quotes) or with block notation where a initial symbol determines how newlines
in the document should be handled.
• Strings in block notation denominated with a pipe (|) will have their newlines
preserved, while the greater than sign (>) will tell the YAML parser to convert
newlines to spaces.
2.4.6 Process
The YAML specification outlines four stages of data when loading and dumping
to and from the format [6] . Native data (in the program environment) is seen as
the first stage. The serialized YAML document (string) is the last stage of data.
The two stages in between can be seen as working stages, where the data has been
transformed into a node graph or event tree to be further processed.
Serialization, or dumping as it is referred to, is done in three distinct stages which
converts the data from a native data structure into series of bytes (strings). First, a
11
CHAPTER 2. BACKGROUND 2.4. YAML
directed graph is generated containing the structure - with nodes, sequences, map-
pings and scalars. The graph is then serialized, where sequential access mediums
must be represented as ordered trees. In YAML they are created by ordered map-
pings, also called serialization trees. General mapping keys are unordered. Finally,
the serialized tree is converted into a Unicode string.
The load (deserialization) process is also compromised of three stages, which
together does the reverse. The input (a string) is parsed to create a serialization
tree in which the node hierarchy, keys, values and ordering is defined. This tree
is then traversed node-to-node, where the data types of values are determined and
converted to, as well as constructing relations and sequences. The final step converts
the representation graph in to native data structures.
12
Chapter 3
Methods
This chapter describes the methods used during the research and testing. This
paper consists of one theoretical study, which builds conclusions from earlier re-
search. From those conclusions a test environment is constructed for testing and
performance comparison.
13
CHAPTER 3. METHODS 3.2. TESTING PROCEDURE
3.1.4 Environment
The tests where conducted in the following computer environment.
OS Windows 7 x64
Processor Core 2 Duo P7350
Memory 4GB DDR3 RAM
Ruby v1.8.7 (2010-12-23 patchlevel 330) (i386-mingw32)
JSON json gem (ext) v1.5.1 (x86-mingw32)
YAML version bundled with Ruby
14
CHAPTER 3. METHODS 3.4. TESTS
3.4 Tests
3.4.1 Serialization performance
The performance of the serialization process will be measured as the time taken
for each serialization implementation to serialize a previously generated set of data
present in the script environment into their respective formats. The execution
times for each implementation, measured in seconds, will be used to determine the
perceived performance.
15
Chapter 4
Results
This chapter presents the results, taken from either testing or conclusions based on
the background information shown in the Background chapter.
4.1.2 Structures
Both formats supports lists and associative arrays. YAML includes functionality
like object references and relational trees natively, whereas JSON doesn’t. Object
references can be added to JSON through third-party extensions however. Gener-
ating JSON from data which includes cyclic structures should be avoided as the
format doesn’t support references and thus will end up in an infinite loop unless
special care is taken to prevent this.
16
CHAPTER 4. RESULTS 4.1. GENERAL DIFFERENCES
4.1.3 Implementation
The simplicity of the specification makes parsing and generation of JSON trivial.
The YAML specification explicitly outlines three stages for the parsing process.
Along with the added features compared to JSON, this greatly increases the com-
plexity of the parser and serializer.
4.1.4 Readability
JSON has a simple and very easy to learn syntax, with readable output as long
as it includes whitespace in the form of indendation and line breaks. YAML, be-
ing designed to produce easily read documents, presents a cleaner output which is
easier to read than an equivalent JSON document however. It has a much higher
readability, while also being more compact (unless the JSON is generated without
indendation). Comment syntax is available for YAML, along with multiple ways to
write strings. JSON previously allowed a JavaScript-like comment syntax, but it
was removed from the specification in a rather early stage.
Strings in YAML does not need to be quoted most of the time, neither for keys
nor values, which greatly improves readability. This is a big difference from JSON,
where all strings and identifiers (keys) must be quoted.
4.1.5 Universality
JSON is widely spread, being a common standard for many applications on the
web. This has lead to most browsers, and many web frameworks, adding built in
support for the format. YAML on the other hand is currently not a common format
for data exchange on the web. The relatively high complexity of YAML results in
higher requirements for implementations.
4.1.6 Syntax
Table 4.2 showcases the differences found from the syntax comparison between
YAML and JSON, using earlier research which can be found in the background
section. The 1.2 version of the YAML specification brought increased compatibil-
ity with JSON. They are not completely compatible however, as YAML requires
whitespace between mappings and key-value pairs, which can be seen in table 4.3.
This example shows that JSON can be written to be valid YAML, but it does not
work the other way round [21] . Correctly formatted JSON (with whitespace) can be
read by a YAML parser adhering to the v1.2 standard due to the fact that YAML
also incorporates the alternative inline syntax.
17
CHAPTER 4. RESULTS 4.1. GENERAL DIFFERENCES
18
CHAPTER 4. RESULTS 4.2. PERFORMANCE
high readability along with support for additional complex features, is foremost
used for files meant to be manipulated by humans, such as configuration files. It of-
fers extended possibilities of describing complex structures, which can be considered
redundant in just plain data exchange.
One could consider YAML to be a good substitute for JSON in many tasks, but
the required whitespace and indentation doesn’t do much good when the data is
only meant to be parsed by another computer. It only results in longer processing
times, for no apparent reason. In terms of serialization, there is often no reason
to implement more advanced functionalities. The only required task would be to
transmit some data set, a mission which JSON performs perfectly fine in most cases,
and many times more efficient (in terms of processing times) than YAML. This is
mostly due to the fact that YAML is more complex in its structure, which affects
parsing speed negatively, compared to JSON where parsing is done quite fast and
efficient. This seems to be an important factor to programmers today, making JSON
more commonly used and widespread.
4.2 Performance
This section aims to present the test results from the performance tests described in
the Methods chapter. Testing was conducted on two different data sets, as described
in 3.2. The results are grouped by attribute tested, and results are presented in
tables showcasing the performance of each method for simple and complex data
sets, respectively.
As the JSON standard does not explicitly require indentation (or even spacing
between identifiers and values), there are two options when generating JSON. The
first one is to simply output only what’s necessary, which means smaller data size
(as no whitespace has to be written). This does however minimize readability, which
is not the case for the second option - to have the serializer include indendation and
whitespace (where approriate) to maximize readability. Both cases are included in
the tests, as the JSON implementation used provides methods for both compact
(JSON.generate) and formatted (JSON.pretty_generate) generation.
19
CHAPTER 4. RESULTS 4.2. PERFORMANCE
The execution times measured for the deserialization (load) process shows results
similar to the serialization process, which can be seen in table 4.5. Both implemen-
tations are much faster at generating data structures from a serialized string than
doing the opposite. YAML is also slower here, but only 6 (simple data set) and 4
(complex) times in this case.
As can be seen in table 4.6, YAML produces the most compact output for
the simple data set - even smaller than the compact JSON output. The prettily
formatted JSON is considerably larger than the YAML (and the compact JSON,
obviously) output. The smaller size of the YAML output is due to the fact that
quotes in the JSON output stands for a noticeable percent (roughly 11%) of the
output.
A change can be seen for the complex data set, where the YAML output is
larger than both variants of JSON. As the document hierarchy gets more complex,
with deeper nesting being added,the amount of whitespace needed for YAML to
correctly indent everything grows noticeably.
20
Chapter 5
Conclusions
5.1.2 Functionality
JSON only supports a simple hierarchy, built through associative arrays and lists.
Extensions exist which enables simulated object references, but this requires some
work and will not be discussed in this comparison. YAML natively supports object
references and relational trees. This enables it to present cyclic data structures and
deep hierarchies easily. Extended data typing, for both custom types and general
data such as date types, is also implemented, which facilitates its aims to produce
human readable files from complicated structures. JSON lacks support for more
complex data types and does not support object references at all. Most advanced
data types can be expressed as a combination of the basic types available in JSON
however. An example would be how dates are incorporated as strings.
Simplicity factors can be considered to be affected when working with more
complicated structures where human readability will be deteriorated, though it
makes no great impact on effectiveness and parsing as it seems. YAML is not as
widely used likely due to the fact that most data being stored or transmitted from
servers to clients over the net doesn’t explicitly need the extended functionality.
21
CHAPTER 5. CONCLUSIONS 5.2. PERFORMANCE
5.2 Performance
The tests shows that the performance of the JSON and YAML implementations
being tested greatly differ. JSON generation and parsing is faster for all sets of data
tested. A simple reason for this (not necessarily exclusive to the implementations
being used) could be the fact that YAML as a format is much more complex due
to the additional features available, and therefore requires more processing when
loading and dumping data. This can be seen by investigating the serialization
process, for which the YAML specification describes four different stages - whereas
JSON only requires a single pass of the data.
One general explanation behind the YAML implementation’s seemingly bad per-
formance is that it does a more thorough work in the general serialization process.
This includes inspecting objects, inserting tags for custom data types and taking
advantage of the alternative string block notations in YAML. The JSON implemen-
tation used only supports serialization to and from the basic data types, and will
fail for custom objects unless a to_json method has been defined. The deserializa-
tion process does not handle custom types at all, which means that the JSON.parse
method is limited to only returning an array or hash with the deserialized values.
Comparing the execution times between simple and complex data yields some
interesting information. YAML seems to handle a growing data set with deeper
hierarchies relatively better than JSON. It is possible that an even bigger data set
could favour YAML even more, possibly ending up as the faster implementation for
data that big.
Looking at the test results and reflecting on the theoretical research, it is clear
that JSON is the favorable serialization format when speed is the most important
factor. As such, JSON is the format to recommend unless the data is very complex
or requires features not available to JSON, such as object references/relations or
the ability to deserialize custom objects into their original form (not generic objects
or arrays). In these cases YAML is a good alternative.
22
CHAPTER 5. CONCLUSIONS 5.3. FORMAT OF CHOICE
5.3.1 Functionality
YAML features some functionality missing in JSON. If the data to be serialized in-
cludes object references and these are to be preserved one will have to use YAML, or
JSON together with an extension providing support for this (such as dojox.json.ref [19] ).
YAML also includes the ability to tag values as specific data types, either to map
data to user defined types or to explicitly cast values other basic types. This allows
YAML implementations to directly transform data structures in the serialized docu-
ment into native objects in the programming environment. JSON lacks this ability,
and such solution will have to be developed separately if complete serialization is
desired.
5.3.2 Readability
YAML is the recommended choice if high human readability is desired as even the
“pretty” output of the JSON generator gets increasingly harder to read for a human
as the data set grows. This factor is of least importantance when the serialized data
is only meant to be transferred and parsed by another computer.
5.3.3 Performance
The performance testing proved the JSON implementation to be many times faster
than YAML for both serialization (dumping) and deserialization (loading). The
complexity of the YAML processing was most likely the biggest reason behind this.
The relevancy of this depends on how critical processing speed is in the project, as
both implemenations processed the data in respectable times.
23
References
[1] Victor Hallberg, 2011. Ruby script used to benchmark JSON and YAML.
<http://github.com/mogelbrod/json-yaml-benchmark>
[6] Ben Kiki O. YAML Ain’t Markup Language (YAML) Version 1.2.
(Accessed 2011-02-21).
<http://www.yaml.org/spec/1.2/spec.html>
[12] Google. Using JSON in the Google Data Protocol. (Accessed 2011-02-21).
<http://code.google.com/intl/sv-SE/apis/gdata/docs/json.html>
24
REFERENCES REFERENCES
25