Developing Met A Web Apps
Developing Met A Web Apps
Applications
Published 2007-03-08
Copyright © 2007 Metaweb Technologies, Inc.
Table of Contents
1. Introduction .................................................................................................................. 1
1.1. The Metaweb Query API ..................................................................................... 1
1.2. About this Manual .............................................................................................. 3
2. Metaweb Architecture .................................................................................................... 5
2.1. The Metaweb Object Model ................................................................................ 5
2.1.1. Common Object Properties ....................................................................... 7
2.1.2. Names, Keys, and Ids ............................................................................... 8
2.1.3. Topics ..................................................................................................... 8
2.2. Values ................................................................................................................ 9
2.2.1. /type/int .................................................................................................. 9
2.2.2. /type/float ................................................................................................ 9
2.2.3. /type/boolean ......................................................................................... 10
2.2.4. /type/id .................................................................................................. 10
2.2.5. /type/text ............................................................................................... 10
2.2.6. /type/key ............................................................................................... 10
2.2.7. /type/rawstring ....................................................................................... 11
2.2.8. /type/uri ................................................................................................ 11
2.2.9. /type/datetime ........................................................................................ 11
2.3. Types ............................................................................................................... 13
2.3.1. Core Types ............................................................................................ 14
2.3.2. Content Types ........................................................................................ 15
2.3.3. Access Control Types ............................................................................. 16
2.4. Domains .......................................................................................................... 17
2.5. Namespaces ..................................................................................................... 17
2.6. Access Control ................................................................................................. 18
3. The Metaweb Query Language ..................................................................................... 19
3.1. JavaScript Object Notation ................................................................................ 19
3.1.1. JSON Literals: null, true, false ................................................................ 20
3.1.2. JSON Numbers ...................................................................................... 20
3.1.3. JSON Strings ......................................................................................... 20
3.1.4. JSON Arrays ......................................................................................... 21
3.1.5. JSON Objects ........................................................................................ 22
3.2. MQL Tutorial ................................................................................................... 23
3.2.1. Our First Query ..................................................................................... 25
3.2.2. Query/Response Symmetry .................................................................... 26
3.2.3. Metaweb Object IDs .............................................................................. 26
3.2.4. Multiple Results and Uniqueness Errors .................................................. 27
3.2.5. Nested Queries ...................................................................................... 30
3.2.6. Asking Metaweb For Objects .................................................................. 31
3.2.7. Expanded Values and Default Properties .................................................. 34
3.2.8. Review: Asking for Values ...................................................................... 35
3.2.9. Too Much Information ........................................................................... 36
3.2.10. The id and name Properties ................................................................... 37
3.2.11. Numeric Constraints ............................................................................. 39
3.2.12. Textual Constraints: Pattern Matching in Queries ................................... 41
3.2.13. Limiting Queries .................................................................................. 43
iii
3.2.14. The Sort Directive ................................................................................ 44
3.2.15. Ordered Collections ............................................................................. 46
3.2.16. Optional Queries .................................................................................. 48
3.2.17. Using Fully-Qualified Property Names .................................................. 49
3.2.18. Wildcards ............................................................................................ 51
3.2.19. Expressing AND in Queries .................................................................. 53
3.2.20. Expressing OR in Queries ..................................................................... 55
3.2.21. Expressing NOT in Queries .................................................................. 57
3.2.22. Reflective Queries ................................................................................ 58
3.3. The MQL Grammar .......................................................................................... 60
4. Metaweb Read Services ............................................................................................... 63
4.1. Basic mqlread Queries with Perl ........................................................................ 63
4.1.1. A Better Perl Album Lister ..................................................................... 64
4.2. The mqlread Service ......................................................................................... 67
4.2.1. mqlread Input ........................................................................................ 67
4.2.2. mqlread Output ...................................................................................... 68
4.2.3. Query and Response Envelopes .............................................................. 68
4.3. A Python Album Lister ...................................................................................... 70
4.4. A Metaweb-enabled PHP Web Application ......................................................... 72
4.5. Metaweb Queries with JavaScript ...................................................................... 74
4.5.1. Listing Albums and Tracks with JavaScript .............................................. 75
4.5.2. Client-side MQL Queries with <script> ................................................... 80
4.6. mqlread Errors ................................................................................................. 81
4.7. mqlread Cursors ............................................................................................... 82
4.8. Fetching Content with trans ............................................................................... 85
4.8.1. Browsing Recent Content on freebase.com .............................................. 86
4.9. Example: A Metaweb Type Browser .................................................................. 89
5. The MQL Write Grammar ............................................................................................ 95
5.1. MQL Write Tutorial .......................................................................................... 95
5.1.1. Creating a Type to Work With ................................................................. 95
5.1.2. Creating Objects .................................................................................... 97
5.1.3. Connecting Objects ................................................................................ 99
5.1.4. Disconnecting Objects .......................................................................... 102
5.1.5. Writes and Default Properties ............................................................... 104
5.1.6. Creating and Connecting More Objects ................................................. 106
5.1.7. Review: Write Directives ...................................................................... 111
5.1.8. Working with Sets ................................................................................ 112
5.1.9. Bidirectional Links and Reciprocal Properties ........................................ 114
5.1.10. Writes and Ordered Collections ........................................................... 118
5.1.11. Namespaces ....................................................................................... 121
5.1.12. Properties, Types, and Domains ........................................................... 127
5.2. MQL Write Grammar ..................................................................................... 131
6. Metaweb Write Services ............................................................................................ 133
6.1. Logging in to Metaweb ................................................................................... 133
6.1.1. The Login API ..................................................................................... 135
6.2. Making Write Queries ..................................................................................... 135
6.2.1. The mqlwrite Query Envelope ............................................................... 135
6.2.2. The Response Envelope ........................................................................ 136
6.2.3. A mqlwrite Utility Function .................................................................. 137
vii
viii
Chapter 1. Introduction
Freebase is a vast, free, open online database of structured knowledge, powered and maintained
by Metaweb Technologies (metaweb.com). Users can access and contribute to Freebase at ht-
tp://www.freebase.com, or through the Metaweb API explained in this manual. If you visit the
freebase.com website, you'll find that Metaweb has seeded the database with detailed information
about popular music and movies. Figure 1.1 is a sample page from this site:
This manual teaches you how to write Metaweb-enabled programs that interact with Freebase.
It assumes that you already know the "what" and "why" of Freebase, and that you have read the
documentation topics (such as "What is Freebase" and "Freebase Demo") linked from the Freebase
home page http://www.freebase.com/view/.
Metaweb (the company) has developed Metaweb (the technology and API). Freebase (the
open global structured knowledge base) is a high-profile public instantiation of the Metaweb
technology, but is unlikely to be the only instantiation.
This manual documents general Metaweb services and APIs, and relies on Freebase for
example data and example applications. The services and APIs are applicable beyond
Freebase, however, and you'll find that this manual uses the name "Metaweb" far more
than it does the name "Freebase".
1
http://www.freebase.com/api/service/mqlread?queries={"albums":{"query":
{"type":"/music/artist","name":"The Police","album":[]}}}
There are a lot of braces, quote marks, colons and commas in that URL, but remember that this
is a programmatic API: the query is supposed to be generated by a computer, not pecked out by
human fingers! Translated into English, this query says:
Find an object in the database whose type is "/music/artist" and whose name is
"The Police". Then return its array of albums.
If you got all the punctuation correct, the Metaweb server will respond to this query with a response
of MIME type application/json. The response is plain text, but your browser will probably
not display it to you. Instead, the browser will allow you to save it to a file, which you can then
view from the command line or with any text editor. When you view it, you'll see something like
this:
{
"status": "200 OK",
"query": {
"album": [],
"type": "/music/artist",
"name": "The Police"
},
"messages": [],
"result": {
"album": [
"Outlandos d'Amour",
"Reggatta de Blanc",
"Live in Boston",
"Zenyatta Mondatta",
"Ghost in the Machine",
"Synchronicity",
],
"type": "/music/artist",
"name": "The Police"
}
}
Cookie-Based Authentication
While Metaweb is rolling out and scaling up the freebase.com website, queries like the one
shown above can only be made by users who have registered for a Freebase account.
Metaweb uses Cookie based authentication. This means that if you have logged on at
www.freebase.com, your web browser will have the cookies it needs for the query URL
above to work. But if you enter that URL on a web browser that has never visited Freebase
before, you'll get a HTTP "401 Unauthorized" error.
Once freebase.com is fully deployed, read queries like this will be open to the world and
no cookies will be required.
"album":[]
In the response, those empty square brackets have been filled in with a long list of album names.
(For brevity, a number of live and compilation albums were omitted from the list shown above.)
Making queries from your web browser's location bar is interesting, but it becomes much more
interesting if we make the queries under programmatic control. Imagine a script running on your
own web server that sends queries to Freebase and formats the results as HTML: this is a Metaweb-
enabled web application. It might look like Figure 1.2.
Chapter 1. Introduction 3
Chapter 3: The Metaweb Query Language
This chapter explains the Metaweb Query Language (MQL) that is used to express Metaweb
queries. The syntax is quite a bit more powerful and complex than what was shown in this
introduction.
This chapter includes the source code for the application that created Figure 1.2.
This portion of the Metaweb graph organizes knowledge about something named "Arnold". It
tells us that Arnold is a Person, Politician, Body Builder, and Actor. It tells us that Arnold's
country of birth is Austria, his political party is Republican, and that he acted in something named
"Terminator" (which is an instance of something known as a "Film"). The relationships in the
graph are bi-directional, so this figure also tells us, for example, that Austria has Arnold as a
citizen, the Republican Party has Arnold as a member, and that Terminator has Arnold as a cast
member. (Note that this is an example only. An "Arnold Schwarzenegger" node does exist at
www.freebase.com, but it may nor may not have the particular relationships pictured here.
5
Arnold
sex: male
birthdate: 1947-July-30
country of birth: Austria
political party: Republican
film: Conan the Barbarian
film: Terminator
film: Kindergarten Cop
elected office: Governor of California
In this view, Arnold is an object with a set of properties. Each property has a name and a value.
What is missing from the view is any kind of typing. In many object-oriented systems, each
property of an object has a known type, and the value of that property must be a member of that
type.
Look back at Figure 2.1 again, and consider the relationships labeled type and instances. Arnold
is an instance of Person, Actor, and Politician. Person, Actor, and Politician are Metaweb types:
they are nodes in the Metaweb graph, but they also impose an object-oriented structure on the
graph. Each type defines a set of properties that its instances are expected to have. Each property
has a name and a type. An object in a Metaweb database, therefore is a node in the graph, plus
the type that it should be viewed as:
Next, let's consider Arnold as an Actor. Notice that the list of properties above included three
properties named film. This is perfectly fine for a nodes-and-relationships model, but it doesn't
fit our object-oriented model where we expect each property to have a single value. A Metaweb
type may specify whether each of its properties must be unique or not. For the Actor type, we'd
need a non-unique property named film. The type of this property is a set of films that Arnold
has acted in:
Arnold as Actor
Set of Film film: [Conan the Barbarian,
Kindergarten Cop,
Terminator]
Note that the film property is an unordered set of values, not an ordered list of values. If you
wanted to display this set of films to an end user, you would most likely want to arrange them
into alphabetical order, or by release date. You can ask Metaweb to order them for you, or you
can sort them yourself. Some sets, such as the set of tracks on an album have an implicit order,
and you can ask Metaweb to return the members of the set in this order. We'll see how to do this
in Chapter 3.
name This property is a set of human-readable names for the object, suitable for display
to the end users of Metaweb. Each name is a /type/text value which holds a
string and defines the human language in which it is written. The name property
is special in two ways:
• An object may have more than one name, but may only have one name per
language. That is, it can have only one English name, only one French name,
and so on.
• When querying Metaweb, you may treat the name property as if it was a single
/type/text value rather than a set of values. Metaweb will automatically
return the object's name (if it has one) in your language of choice.
key This property is a set of fully-qualified names for the object. These fully-quali-
fied names are intended for use by developers and scripts and are not typically
displayed to end users. Each member of the set is a /type/key value that spe-
cifies a namespace object and a name within the namespace. Metaweb guarantees
that no two objects will ever have the same fully-qualified name.
guid Every object in a Metaweb database has a globally unique identifier or guid.
The guid property specifies the unique identifier for an object. A guid is a long
string of hexadecimal digits following the hash character, and might look like
this: #0801010a40005e838000000000019bd2. No two objects will ever have the
same value of the guid property. This property is read-only.
type This property is the set of types associated with the object. The object can be
viewed as an instance of any of these types. Each type is itself a Metaweb object,
of /type/type.
timestamp This read-only property is a single value of /type/datetime that specifies when
the object was created.
creator This read-only property is a single link to a /type/user object that specifies
which Metaweb user created the object.
A Metaweb database contains an object that represents the human language English. The name
property of this object specifies its human-readable name: "English". Metaweb objects can have
only a single name in each language. Our English object might have names "Anglais" and "Ingles"
in French and Spanish. It is important to understand that the human-readable name of an object
does not uniquely identify it: there may be many other Metaweb objects with the name "English".
Because the name property allows only one name in each language, you cannot use it to specify
nicknames for an object. You cannot, for example, give the English object the name "American
English" in addition to "English". As we'll see below, most Metaweb objects that are intended
for display to end-users are instances of a type called /common/topic. This type defines a property
named alias, which you can use to specify any number of nicknames for an object.
The key property of the English object is completely different than the name property. It specifies
that the object has the name "en" in a particular namespace object. That namespace object has a
key property of its own, which specifies that it has the name "lang" in a special root namespace
object. Metaweb uses the slash character to delimit names, so the English object has the fully-
qualified name "/lang/en". Fully-qualified names are intended for developers and are often used
in code, so we'll usually write them in code font like this: /lang/en.
The critical thing about fully-qualified names is that they are unique. Metaweb ensures that no
two objects ever have the same fully-qualified name at the same time.
Human-readable names and fully-qualified names are optional; Metaweb objects are not required
to have either. But every object does have a guid value that identifies it uniquely. A unique guid
is assigned to an object when it is created, and it never changes. It is always possible to uniquely
identify an object by specifying the value of its guid property. The guid of the /lang/en object
is "#9202a8c04000641f8000000000000092"
Guids and fully-qualified names are both unique identifiers for objects. The id property is flexible
and allows you to use either one. If you want to refer to the English object, you could specify an
id property of "#9202a8c04000641f8000000000000092" or "/lang/en".
2.1.3. Topics
Objects that are displayed to users of freebase.com are called topics. These are regular Metaweb
objects that are members of the type /common/topic in addition to any of their other, more-spe-
cific types. /common/topic defines properties that allow descriptions, nicknames, documents and
images to be associated with an object, and the freebase.com client uses these properties to as-
semble an informative web page that describes the object or topic.
2.2. Values
Like many object-oriented programming languages, Metaweb draws a distinction between objects
(arbitrary collections of properties) and values (single primitives such as numbers, dates and
strings). Metaweb defines nine value types. Like all Metaweb types, value types are identified
by type objects. Each type object has a fully-qualified name such as /type/int (for the value
type that represents integer values).
Values have a dual nature in Metaweb. Depending on how you ask about them, they may behave
like primitives, or like simple objects. If you query a value as if it were an object, then it behaves
like a simple object with two properties (as we'll see shortly, two of the value types actually include
a third property as well):
type this property refers to the type object that specifies the type of the value.
If you query a value as a primitive, then just the value of the value property is returned.
The various Metaweb value types are described in the sub-sections that follow. Notice that value
types are in the /type domain, and that their names fall under the /type namespace. (We'll see
more about namespaces in Section 2.5.)
2.2.1. /type/int
Values of this type are signed integers. Metaweb uses a 64-bit representation internally, which
means that the range of valid values of /type/int is from -9223372036854775808 to
9223372036854775807. An integer literal is simply an optional minus sign followed by a sequence
of decimal digits. Metaweb does not support octal or hexadecimal notation for integers, nor does
it allow the use of exponential notation for expressing integers.
2.2.2. /type/float
Values of this type are signed numbers that may include an integer part, a fractional part, and an
order of magnitude (a power of ten by which the integer and fractional parts are multiplied.)
Metaweb uses the 64-bit IEEE-754 floating point representation which supports magnitudes
between 10-324 and 10308. C and Java programmers may recognize this as the double datatype.
Metaweb does not support the special values Infinity and NaN, however.
A literal of /type/float consists of an optional minus sign, and optional integer part, and optional
decimal point and fractional part and an optional exponent. The integer and fractional parts are
There are an infinite number of real numbers, and a 64-bit representation can only describe a finite
subset of them. Any number with 12 or fewer significant digits can be stored and retrieved exactly
with no loss of precision. Numbers with more than 12 significant digits may have those digits
truncated when they are stored in Metaweb.
2.2.3. /type/boolean
There are only two values for this type; they represent the boolean truth values true and false.
Note that Metaweb sometimes uses the absence of a value (null) in place of false.
2.2.4. /type/id
Values of this type are object identifiers, either guids or fully-qualified names. The object prop-
erties guid and id have values of this type.
2.2.5. /type/text
An instance of /type/text is a string of text plus a value that specifies the human language of
that text. The name property of an object is a set of values of this type.
/type/text is unusual. Its value property specifies the text itself, but it also has a lang property
that specifies the language in which the text is written. The lang property refers to an object of
type /type/lang. The /lang namespace holds many instances of this type, such as /lang/en for
English. We'll say more about /type/lang and the /lang namespace later in this chapter.
The text of a /type/text value must be a string of Unicode characters, encoded using the UTF-
8 encoding. The encoded string must not occupy more than 4096 bytes. Longer chunks of text
(or binary data) can be stored in Metaweb in the form of a /type/content object, which is de-
scribed later.
2.2.6. /type/key
Instances of /type/key represent a fully-qualified name. The key property of an object is a set
of /type/key values. The value property of a /type/key value is the local, or unqualified part
of a fully-qualified name. Like /type/text, /type/key has a third property. The namespace
property of a key refers to the /type/namespace object that qualifies the local name. The
namespace property and the value property combine to produce a fully-qualified name.
The value property of a key must be a string of ASCII characters, and may include letters,
numbers, underscores, hyphens and dollar signs. A key may not begin or end with a hyphen or
underscore. The dollar sign is special: it must be followed by four hexadecimal digits (using letters
A through F, in uppercase), and is used when it is necessary to map Unicode characters into
ASCII so that they can be represented in a key. To represent an extended Unicode character (that
does not fit in four hexadecimal digits), encode that character in UTF-16 using a surrogate pair,
and then express the surrogate pair using two dollar-sign escapes.
Keys used as names for domains, types and properties are further restricted: they may not include
hyphens or dollar signs, and may not include two underscores in a row.
2.2.7. /type/rawstring
A value of /type/rawstring is a string of bytes with no associated language specification. The
length of the string must not exceed 4096 bytes.
Use /type/rawstring instead of /type/text for small amounts of binary data and for textual
strings that are not intended to be human readable.
2.2.8. /type/uri
An instance of /type/uri represents a URI (Uniform Resource Identifier: see RFC 3986). The
value property holds the URI text, which should consist entirely of ASCII characters. Any non-
ASCII characters, and any characters that are not allowed in URIs should be URI-encoded using
hexadecimal escapes of the form %XX to represent arbitrary bytes.
2.2.9. /type/datetime
An instance of /type/datetime represents an instant in time. That instant may be as long as a
year or as short as a fraction of a second. The value property is a a string representation of a date
and time formatted according to a subset of the ISO 8601 standard. /type/datetime only supports
dates specified using month and day of month. It does not support the ISO 8601 day-of-year,
week-of-year and day-of-week representations.
A /type/datetime value that represents the first millisecond of the 21st century looks like this:
2001-01-01T00:00:00.001Z
• Longer intervals of time (years, months, etc.) are specified before shorter intervals (minutes,
seconds, etc.).
• Months and days must always be specified with two digits, starting with 01, even when the
first digit is a 0.
• The components of a date are separated from each other with hyphens.
• A date is separated from the time that follows with a capital letter T.
• Times are specified using a 24-hour clock. Midnight is hour 00, not hour 24. Hours and minutes
must be specified with two digits, even when the first digit is 0.
• Seconds must be specified with two digits, but may also include a decimal point and a fractional
second. Metaweb allows up to 9 digits after the decimal point.
• The hours, minutes, and seconds components of a time specification are separated from each
other with colons.
• A time may be followed by a timezone specification. The capital letter Z is special: it specifies
that the time is in Universal Time, or UTC (formerly known as GMT). Local timezones that
are later than UTC (east of the Greenwich meridian) are expressed as a positive offset of hours
and minutes such as +05:30 for India. Local times earlier than UTC are expressed with a neg-
ative offset such as -08:00 for US Pacific time. If no timezone is specified, then then the
/type/datetime value is assumed to be a local time in an unknown timezone. Specifying a
timezone of +00:00 is the same as specifying Z. Specifying -00:00 is the same as omitting the
timezone altogether.
• All characters used in the /type/datetime representation are from the ASCII character set,
so date and time values can be treated as strings of 8-bit ASCII characters.
A /type/datetime value can represent time at various granularities, and any of the date or time
fields on the right-hand side can be omitted to produce a value with a larger granularity. For ex-
ample, the seconds field can be omitted to specify a day, hour, and minute. Or all the time fields
and the day-of-month field can be omitted to specify just a year and a month. Also, the date fields
can be omitted to specify a time that is independent of date. A timezone may not be appended to
a date alone: there must be at least an hour field specified before a timezone.
Here are some example /type/datetime values that demonstrate the allowed formats:
+--/type/id
|
+--/type/int
|
+--/type/float
|
+--/type/boolean
|
+--Value Types--+--/type/text
| |
| +--/type/rawstring
| | +--/restaurant domain
| +--/type/uri |
| | +--/location domain
| +--/type/datetime |
| | +--/film domain +-/music/track
Types-+ +--/type/key | |
| +--/music domain--+-/music/album
| | |
| +--Freebase Types-----+--/book domain +-/music/artist
| | |
| | +--etc.
| |
+--Object Types-+--Core Types (/type domain)
|
+--Common Types (/common domain)
|
+--User-defined Types
The sub-sections that follow introduce the most important core and common types. You do not
need to understand these types in detail in order to make productive use of Metaweb. Still,
knowing what these basic types are is a helpful orientation to the system.
2.3.1.1. /type/object
Earlier in this chapter, we explained that all Metaweb objects share a set of common properties:
name, id, key and so on. These universal object properties are defined by a core type named
/type/object. If you are an object-oriented programmer familiar with languages such as Java,
you might guess that /type/object is the root of the type hierarchy, and that it is the superclass
of all other object types.
In fact, however, Metaweb does not have a type hierarchy. Types do not have supertypes.
/type/object is not a normal type. Objects are never declared to be instances of this type. Re-
member that one of the common object properties is type: it specifies a set of types for the object.
/type/object never needs to be a member of this set. In fact, an object's set of types can be
empty, and the object will still have all of the common properties. The /type/object type exists
simply as a convenient placeholder. It serves to group the /type/property objects that represent
the common object properties.
2.3.1.2. /type/type
This type describes a type, which means that it is the only type that is an instance of itself. Types
have five properties:
instance The set of instances of the type. For commonly used properties, this
set may obviously grow quite large. Recall, however that all relation-
ship between objects in Metaweb are inherently bi-directional. Since
every object has a type property that refers to its type, it follows that
every type has a set of incoming links from its instances. Thus, every
type automatically maintains a set of its instances.
default_property The name of the default property for the type. When you ask Metaweb
to return an object as if it were a primitive value, Metaweb returns the
value of the default property for that type. For value types, the default
property is value. For most object types the default property is name.
And for core types in the /type domain, the default property is id.
2.3.1.3. /type/property
Every type defines a set of properties for its instances. The members of this set are /type/property
objects. The common name and key properties of a property object specify the human-readable
• Whether the property is unique. A unique property may only have a single value (or may have
no value). A property that is not unique has a set of zero or more values.
The notion of a reciprocal property deserves more explanation. Recall that all links in Metaweb
are bi-directional. This means that any time a property of type A refers to an object of type B
Metaweb automatically has a link from that object of type B back to the originating object of
type A. Type B can take advantage of this bi-directionality and include a property that links back
to objects of type A. As a concrete example, consider the properties property of /type/type:
it specifies the set of properties for a type. Its reciprocal is the schema property of /type/property,
which specifies the type object (or "schema") of which the property is a part. You'll find further
exploration of reciprocal properties in Chapter 5.
2.3.1.4. /type/domain
A domain represents a set of related types, and also serves as a namespace for those types. For
access control purposes, each domain object refers to one or more usergroup objects that "own"
the domain. Only members of the specified usergroups are allowed to add new types to the domain
or to edit types within the domain.
2.3.1.5. /type/namespace
This type represents a namespace, and is used by the value type /type/key. It defines the keys
property which is a set of /type/key values that specify the names in the namespace.
/type/content
Large chunks of content, such as HTML documents and graphical images are not stored in
regular Metaweb nodes. Instead, these large objects (sometimes called lobs) are kept in a
separate store. A /type/content object is the bridge between the Metaweb object store and
the Metaweb content store. A /type/content object represents an entry in the content store,
and the guid of the /type/content object is used as an index for retrieving the content.
In addition to providing access to the content store, /type/content defines important prop-
erties. The media_type property specifies the MIME type of the content. For textual content,
the text_encoding and language properties specify the encoding and language of the text.
The length property specifies the size (in bytes) of the content. The source property refers
to a /type/content_import object that specifies the source of the content.
/type/content_import
This type describes the source of imported content. Its properties include the URI or filename
from which the content was obtained, the user who imported the content, and a timestamp
that specifies when the content was imported.
/type/media_type
Instances of this type represent a MIME media type such as "text/html" or "image/png". In-
stances are given fully-qualified names within the /media_type namespace, and can be spe-
cified with ids like /media_type/text/html or /media_type/image/png.
/type/text_encoding
Instances of this type represent standard text encodings, such as ASCII and Unicode UTF-8.
Instances are given fully-qualified names within the /media_type/text_encoding namespace,
and can be specified with ids such as /media_type/text_encoding/ascii.
/type/lang
This type represents a human language. It is used by /type/content objects and also by
/type/text values. Pre-defined instances of this type are given fully-qualified names within
the /lang namespace, and can be specified with ids like /lang/en and /lang/fr.
/common/topic
As described earlier in this chapter, Metaweb objects that are intended for display to end
users are called "topics". Such objects typically have some appropriate domain-specific type,
such as /music/artist or /food/restaurant, but are also instances of the type /common/top-
ic. This type defines properties that allow documents and images to be associated with the
topic. Another property allows a set of URLs to be associated with the topic. Also, because
objects can only have a single name in any given language, /common/topic has an alias
property that allows any number of nicknames to be specified for the topic.
/common/document
This type represents a document of some sort. /common/topic uses this type to associate
documents with topics. The most important property is content, which specifies the single
/type/content object that refers to the document content. Other properties of /common/doc-
ument provide meta-information about the document, such as authors, publication date, and
so on.
/common/image
/type/content objects that represent images are typically co-typed with this type. /common/im-
age defines a size property that specifies the pixel dimensions of the image.
/type/usergroup
This type represents a set of users.
/type/permission
This type is the key to Metaweb access control. Its properties specify the set of objects that
require this permission for modifications, and also the set of usergroups that have the permis-
sion. See Section 2.6 for further details.
2.4. Domains
A domain is a Metaweb object of /type/domain. It represents a collection of related types. We've
already seen a number of types from the /type and /common domains. freebase.com pre-defines
types in a number of general domains, and Chapter 3 and Chapter 4 feature many examples using
the Freebase /music domain. The set of Freebase domains is expected to grow, but at the time
of this writing, it includes:
As you might guess from the names of these domains, domain objects are also instances of
/type/namespace, and the types contained by domains are members of both the domain and the
namespace.
Every Metaweb user who registers for an account has their own domain. If your Metaweb username
is fred, then your domain is /user/fred/default_domain. When you use the freebase.com client
to define a new type named Beer, it is given the id /user/fred/default_domain/beer. If your
type becomes an important and commonly used one, it may be promoted by Metaweb adminis-
trators to a top-level domain. In this case, your type might be given a new fully-qualified name
like /zymurgy/beer.
2.5. Namespaces
Namespaces are a critical part of the Metaweb infrastructure because they allow us to refer to
important objects, such as types, with simple mnemonic names rather than opaque guids. It would
be ve ry inc onvenie n t t o q u ery M et aweb i f we h ad t o w r i t e
"#9202a8c04000641f8000000000000565" instead of "/common/topic", for example.
We've already learned about a number of important namespaces including /type, /user, /lang,
and /media_type. In addition to these, each domain and user object is also a namespace. Also,
there is the root namespace, whose id is simply /.
Metaweb's access control model is quite simple. Every object has a permission property that
refers to a /type/permission object. The permission object specifies a set of usergroups whose
members have permission to modify the object. If a user is a member of one or more of the spe-
cified groups, then that user can edit the object. Otherwise, the user is not allowed to.
This simple access control model is, by default, also very open. In order to allow and encourage
free collaboration most Metaweb objects have a permission object that gives edit permission to
all Metaweb users. If Metaweb user Fred creates a new object, his friend Jill can freely edit that
object. Any other Metaweb user can edit the object as well, and there is no way for Fred to restrict
the permission on his object.
The primary exception to this open access control model is type objects. Having a stable type
system is very important. Each domain has a usergroup associated with it, and only members of
that usergroup can create new types in the domain or alter existing types in the domain. Each
user account has an associated domain. Fred's domain is /user/fred/default_domain. This
domain has an associated usergroup. Initially, Fred is the only member of this group. He is allowed
to add to the usergroup, and if he adds his friend Jill, then she is permitted to create new types
in Fred's domain.
Other key parts of the Metaweb infrastructure also have restrictive access control, of course.
Ordinary users are not allowed to insert objects into the /lang namespace or the /type domain,
for example.
This chapter teaches you to write MQL queries, but does not explain how to issue those queries
to and retrieve responses from Metaweb servers : that is the topic of Chapter 4. Also, this chapter
does not cover updates, or writes, to Metaweb. Updates are expressed using a variant of MQL
that is covered in Chapter 5.
1
You should read this section even if you already know JavaScript. JSON is only a subset of JavaScript, and its syntax is stricter than
JavaScript syntax.
2
The JSON syntax diagrams that appear below are also from the JSON website, where they have been placed in the public domain.
19
A JSON-formatted string is a serialized form of an array or object. The array or object may contain
numbers, strings, other arrays and objects, and the literal values null, true, and false. These
JSON values are illustrated in Figure 3.1 and explained in the sub-sections that follow:
3
JSON itself supports 32-bit, 16-bit and 8-bit encodings of Unicode text. Metaweb, however, requires the 8-bit UTF-8 encoding.
Escape Character
\" A quotation mark that does not terminate the string
\\ A single backslash character that is not an escape
\/ A forward slash character. Although it is legal to escape the forward slash character, it
is never necessary to do so.
\b The Backspace character
\f The Formfeed character
\n The Newline character
\r The Carriage Return character
\t The Tab character
\uXXXX The Unicode character whose encoding is the four hexadecimal digits XXXX. To encode
extended Unicode codepoints that do not fit in four hex digits, use two \u escapes to
encode a UTF-16 surrogate pair.
Arrays may contain any JSON values, including objects and other arrays. The elements of a JSON
array need not have the same type (though in MQL they always do). The following JSON array
might be returned in response to a MQL query:
A JSON array with no elements consists of just the square brackets: []. Empty arrays often appear
in MQL queries.
• an associative array;
• a dictionary; or
JSON objects are written as a comma-separated list of name/value pairs, enclosed in curly braces.
A name/value pair is a JSON string (the name) followed by a colon followed by any JSON value,
which may include nested objects and arrays. See Figure 3.4
{
"type" : "/music/artist",
"name" : "The Police",
"album" : []
}
JavaScript programmers should note that JSON requires property names to appear within double
quotes, even though the JavaScript language does not. Arbitrary whitespace is allowed within
JSON objects and arrays, but trailing commas (after the final array element or last name/value
pair) are not. An empty JSON object, with no properties at all is simply a pair of curly braces:
{}. As we'll see, empty objects are not uncommon in MQL queries.
<html>
<head><title>Metaweb Query Editor</title>
<style> /* CSS styles for nice output */
#q, #r { width: 400px; height: 300px; border-width:0px;padding:5px;}
th {background-color:black; color:white; font:bold 12pt sans-serif;}
td.border,th {border:solid black 3px;}
input { margin: 5px; font-weight: bold; }
table { border-collapse: collapse;}
</style>
</head>
<body>
<!-- Form makes an HTTP GET request to mqlread, results go in iframe r -->
A MQL query is a JSON object. In order to get a Metaweb server to execute the query,
however, you must nest it inside two more JSON objects, which are known as the "envelope".
We'll explain envelopes in detail in Section 4.2.3 when we're explaining how to send MQL
queries to Metaweb. For now, however, you just need to know enough about envelopes so
that you can try out queries in query editors.
In order to execute a query in the query editor of Example 3.1, you must put this text before
your query:
{"qname":{"query":
The text "qname" is actually an arbitrary name for the query; you can use any name you
want here. The text "query" must be entered verbatim. Whitespace is optional in JSON, so
you can of course insert spaces and newlines into that text to make it look nice. Since the
envelope is a JSON object, you must provide the matching closing braces. This means that
you must follow your query with:
}}
Metaweb's response to MQL queries are also JSON objects, and like the queries are wrapped
within envelope objects. If you use the query editor in Example 3.1, you'll see the complete
response envelope object, and you'll find the MQL query result as the value of a property
named result in an object that is itself the value of a property named qname (or whatever
query name you choose in the query envelope). If you use the Freebase query editor, you
have the option of viewing just the result or the complete response envelope. In this chapter,
we omit the response envelopes except when we need to look at error messages that appear
in the envelope.
{
"type" : "/music/artist",
"name" : "The Police",
"album" : []
}
Once we extract the result from its envelope, we're left with the following JSON object (some
of the album names are omitted here for brevity):
{
"type": "/music/artist",
"name": "The Police",
"album": [
"Outlandos d'Amour",
"Reggatta de Blanc",
"Zenyatta Mondatta",
"Ghost in the Machine",
"Synchronicity",
]
}
To query Metaweb we tell it what we already know by specifying properties and their values:
"type" : "/music/artist",
"name" : "The Police",
And then we tell it what we want to know by specifying properties without values:
"album" : []
Sending an empty array in a MQL query tells Metaweb that we'd like to have the array filled in.
Note that the property we query in the example above is named "album" and not "albums",
even though bands like The Police may well have more than one album. This reflects the
underlying nature of Metaweb: the database object that represents The Police has many
links of type "album" that refer to the objects that represent those albums. The type /mu-
sic/artist aggregates these many links into a single set of albums, but retains the singular
name "album" because that is the name of the underlying link type.
Also, as we'll see soon when you want to obtain information (such as a list of tracks) about
one particular album, you specify a single value (instead of an array) for the album property.
In this case, the singular name makes a lot of sense.
Although property names are typically singular, there are exceptions to this rule, and
sometimes you'll see a plural property name.
Query Result
{ {
"type" : "/music/artist", "type": "/music/artist",
"name" : "The Police", "name": "The Police",
"album" : [] "album": [
} "Outlandos d'Amour",
"Live in Boston",
"Reggatta de Blanc",
"Zenyatta Mondatta",
"Ghost in the Machine",
"Synchronicity",
"Every Breath You Take: The Singles",
"Greatest Hits"
]
}
This symmetry of queries and responses is a fundamental and elegant part of MQL. We'll use
this two-column query/response format throughout the chapter.
This query includes the same name and type as the last query. But instead of specifying an empty
array of albums, it specifies a null id. The null value is our query: this is what we want Metaweb
to fill in. The response looks just like the query, but the null is replaced with an id string.
The ids shown in the online HTML-formatted version of this tutorial are valid freebase.com
guids, and should remain valid in perpetuity. In hardcopy and PDF versions of the tutorial,
guids have been shortened to allow queries and responses to fit side-by-side in the two-
column format shown above. To use these printed guids online, restore them to validity by
inserting 9202a8c0400064 between the # and the digit that follows it.
{
"id": "#1f800000000006df1b",
"name" : null,
"type" : null
}
We're telling Metaweb what we have (the id) and asking for the values (name and type) that we
don't have. When we submit this query, though, it doesn't work. The response envelope looks
like this:
{
"status": "200 OK",
"qname": {
"status": "/mql/status/error",
"messages": [
{
"status": "/mql/status/result_error",
"info": {
"count": 2,
"result": [
"#1f80000000011ae833",
"#1f8000000000000565"
]
},
The various status properties tell us that something is wrong with the query. The messages[0]
object provides details. Its message property gives us an error message, and its info object
provides details to go with the message. The query.error_inside and path properties tell us
that the error is associated with the type property in our query.
What we learn from this response is that Metaweb could not respond to our query because we
asked for a single type and it found two types. Let's try the query again. Now we're requesting a
single name and an array of types for this uniquely specified object. This query works:
Query Result
{ {
"id":"#1f800000000006df1b", "id":"#1f800000000006df1b",
"name" : null, "name" : "The Police",
"type" : [] "type" : [
} "/common/topic",
"/music/artist"
]
}
The Metaweb object we asked about has the name "The Police" and it is a member of two types:
/common/topic and /music/artist. Recall from Chapter 2 that /common/topic is a very generic
type. Just about every Metaweb object that represents something an end user would have an interest
in is a member of this type. The lesson to draw here is that objects almost always have more than
one type, and any queries on the type property should use arrays. In general, it is always safe to
use [] in place of null in your queries. If there is only one result the array returned in the response
will simply have a single element. When you know that there can only be one result, however,
it is usually more convenient and efficient to use null.
There is a fundamental asymmetry to MQL: when we query the type of an object, we get
an array of types. But when we look up an object by type, we specify only one type. Metaweb
objects have a set of types, not one single type. So when we specify the type of an object
in a MQL query, all we are saying is that the object has at least one "type" link with that
Uniqueness errors are a common pitfall for developers crafting Metaweb queries. Recall that
/type/property allows certain properties to be specified as unique. id is unique: no object can
have more than one id. The name property behaves as if it is unique (but is only unique per lan-
guage). As we've seen, however, the type property, is not unique: objects can (and most objects
do) have more than one type. If a property is not guaranteed to be unique, then you should always
use square brackets when querying its value.
The id property is unique in another way. As we've seen, no object can have more than one id.
More importantly, however, no two objects share the same id. Therefore, if a query includes an
id, you can be confident that no more than one object will match. Therefore, a query like this
one is correct:
{
"id": "#1f800000000006df1b",
"name" : null,
"type" : []
}
Recall that an object can have only one name in any given language, and that the name property
behaves like a unique property even though it is not really. For this reason, it is always safe to
query name with null, as we do above, rather than [].
On the other hand, the query that we started this tutorial with is risky:
{
"type" : "/music/artist",
"name" : "The Police",
"album" : []
}
This query worked for us: Freebase only knows about one musical artist named "The Police".
Note, however, that there is no guarantee that this will always be the case. There is nothing to
prevent someone from adding another band named "The Police" to freebase.com. If such an ad-
dition were made, our query would suddenly fail.
Depending on the design of your application, a uniqueness failure in this situation might actually
be exactly what you want. If you get two results when you expected one, then perhaps the right
thing to do is fail and display an error message to the user. On the other hand, you could write
your query more cautiously, using square brackets, so that multiple results can be returned:
[{
"type" : "/music/artist",
"name" : "The Police",
"album" : []
}]
Query Result
{ {
"type" : "/music/artist", "type" : "/music/artist",
"name" : "The Police", "name" : "The Police",
"album" : { "album" : {
"name" : "Synchronicity", "name" : "Synchronicity",
"track" : [] "track" : [
} "Synchronicity I",
} "Walking in Your Footsteps",
"O My God",
"Mother",
"Miss Gradenko",
"Synchronicity II",
"Every Breath You Take",
"King of Pain",
"Wrapped Around Your Finger",
"Tea in the Sahara",
"Murder by Numbers"
]
}
}
The interesting thing about this query is that it includes a nested query. We're asking for an array
of tracks from an album named "Synchronicity" recorded by a band named "The Police".
There are other ways to obtain the same information. Here's another query that gets us the same
data:
{
"type" : "/music/album",
"name" : "Synchronicity",
"artist" : "The Police",
"track" : []
}
Rather than identifying the band first, and then querying an album recorded by that band, this
query goes straight to the album, which it identifies by name and by artist. (It assumes this is
enough to uniquely identify a single album and avoid uniqueness errors!)
{
"id" : "#1f800000000006df1b",
"name" : null,
"type" : []
}
It asks for the name and types of a unique object. Both the name, and the individual elements of
the type array are returned as strings. Recall, however, that the name of an object is of /type/text
and that types are of /type/type. /type/text is a value type in the Metaweb object model, but
we can treat values as objects if we want to. Let's modify the query to use {} and [{}] instead
of null and []. {} asks for a single value, expanded as an object, and [{}] asks for an array of
values expanded into objects:
{
"id": "#1f800000000006df1b",
"name" : {},
"type" : [{}]
}
This query fails with a uniqueness error. The object we're querying has more than one name. The
name property behaves specially when queried with null: it returns the value of the name in the
default language. It only works to query name with {} if there is only one name, with no transla-
tions. To make the query work, we ask for both the name and type with [{}]:
Query Result
{ {
"id": "#1f800000000006df1b", "id":"#1f800000000006df1b",
"name" : [{}], "name":[{
"type" : [{}] "lang":"/lang/fr",
} "type":"/type/text",
"value":"The Police"
},{
"lang":"/lang/en",
"type":"/type/text",
"value":"The Police"
},{
"lang":"/lang/es",
"type":"/type/text",
"value":"The Police"
}],
"type":[{
"id":"/music/artist",
"name":"Musical Artist",
"type":["/type/type","/freebase/type_profile"]
},{
We learn from this query that the name of the specified object is "The Police" in each of several
different languages (some languages have been omitted here). We also learn that the object is of
type /common/topic and /music/artist and that these types have common names (as opposed
to the formal ids that we use in queries) "Topic" and "Musical artist".
Let's use this query technique to learn more about the tracks on the album Synchronicity. (The
result is truncated for brevity.)
Query Result
{ {
"type" : "/music/album", "type": "/music/album",
"name" : "Synchronicity", "name": "Synchronicity",
"artist" : "The Police", "artist": "The Police"
"track" : [{}] "track": [{
} "type": [ "/music/track" ],
"name": "Synchronicity I",
"id": "#1f8000000001275dbb"
},{
"type": [ "/music/track" ],
"name": "Walking in Your Footsteps",
"id": "#1f8000000001275dc2"
},{
"type": [ "/music/track" ],
"name": "O My God",
"id": "#1f8000000001275dc9"
}]
}
This query doesn't actually tell us much about the tracks themselves. We already know the type
of the tracks. The id might be useful in future queries, but it doesn't tell us anything about the
track. The name is useful, but we could have obtained that without using curly braces, just by
querying "track":[].
When you ask Metaweb to fill in empty curly braces for you, it returns all the properties if the
value is a value type. The name property of an object is of /type/text, and querying it with {}
returns all of its properties. If the property is an object type instead of a value type, then Metaweb
returns only the name, type and id properties (all of which are defined by /type/object and are
common to all Metaweb objects). That is, instead of using [{}], we could write out the query
explicitly like this:
What if we want to know absolutely everything freebase.com knows about the tracks on Synchron-
icity? We write the query using a wildcard: 4
{
"type" : "/music/album",
"name" : "Synchronicity",
"artist" : "The Police",
"track" : [{"*":null}]
}
"*" is a wildcard property name. It means "all property names". (Note that it is different from []
which means "all property values") The type /music/track defines a number of its own properties,
and the expansion of the "*" wildcard also includes the universal properties defined by
/type/object. Here, for example, is what freebase.com knows about the song "Walking in Your
Footsteps":
{
"name":"Walking in Your Footsteps",
"type":["/music/track"],
"id":"#1f8000000001275dc2",
"guid":"#1f8000000001275dc2",
"creator":"/user/mwcl_musicbrainz",
"key":["a2313ee6-ccce-4ced-bc3c-af7d4b06f09f","TRACK179899"],
"permission":"/boot/all_permission",
"timestamp":"2006-12-10T00:39:58.0931Z",
"album":["Synchronicity"],
"length":[216.8],
"lyricist":[],
"lyrics":[],
"song":[],
"artist":null,
"composer":[],
"acquire_webpage":[]
}
If {} gives us too little useful information, And {"*":null} gives us more than we really need,
then we must refine our query to express exactly what it is we would like to know. Here's how
we ask for just the name and length of each of the tracks:
4
We'll return to the topic of wildcards later in this tutorial.
Default properties are not only used when you ask Metaweb to fill in a null or a [] for you. They
are also used when you express the information you already have. Consider the following query:
{
"type" : "/music/album",
"name" : "Synchronicity",
"artist" : "The Police",
"track" : []
}
{
"type" : "/music/album",
The verbose form of the query illustrates the fact that the succinct form relies on default properties.
The name property is of /type/text, whose default property is value. The artist property is of
type /music/artist, whose default property is name.
If the property is of object type, return an object that includes its name, id, and
type properties. In this case, the term {} is equivalent to:
{"name":null,"id":null,"type":[]}.
[{}] Like {}, but return an array of objects instead of a single one.
{"*":null} A query of this form returns an object and all of its properties. The meaning of
"all of its properties" requires some explanation, however. Suppose Metaweb
sees the query "p":{"*":null}. It looks up the property p and determines that
its expected type is t. Then it looks up the type t, and determines what properties
that type defines. Then it expands the wildcard query so that each property of t
(plus the properties of /type/object) is queried with either null or [] depending
on whether it is unique or not.
[{"*":null}] Like {"*":null}, but return an array of objects instead of a single one.
{"*":{}} A query of this form is like {"*":null}, except that when the wildcard is expan-
ded, each property is queried with {} or [{}] instead of null or [].
[{"*":{}}] Like {"*":{}}, but return an array of objects instead of a single one.
Here's a simple query to answer this question, along with the freebase.com response:
Query Result
[{ [{
"type":"/music/track", "type" : "/music/track",
"name":"Too Much Information", "name" : "Too Much Information",
"artist":null, "artist" : "The Police",
"album":null, "album" : "Message in a Box (disc 3)",
"length":null "length" : 222.733
}] },{
"type" : "/music/track",
"name" : "Too Much Information",
"artist" : "The Police",
"album" : "Ghost in the Machine",
"length" : 222.733
}]
You should have no trouble understanding this query. It requests an array of tracks with the
specified name, and asks Metaweb to fill in the artist, album, and length of each track. But there
are other ways to ask for this information. The above track-centric query is simple, but returns
an unordered and unstructured list of tracks. If multiple artists have recorded the same song, we
might like the result to be organized by artist. Here's how to write an artist-centric version of the
query, along with the more structured response from freebase.com:
Query Result
[{ [{
"type":"/music/artist", "type" : "/music/artist",
"name":null, "name" : "The Police",
"album": [{ "album" : [{
"name":null, "name" : "Ghost in the Machine",
"track": [{ "track" : [{
"name":"Too Much Information", "name" : "Too Much Information",
"length": null "length" : 222.733
}] }]
}] }, {
}] "name" : "Message in a Box (disc 3)",
"track" : [{
"name" : "Too Much Information",
"length" : 222.733
}]
}]
}]
At first glance, it seems as if the only information we're providing to Metaweb with this query
is the track name. But notice that we also explicitly specify the type of the outermost object:
we've said that we want an object of type /music/artist. This is critical, because types have
properties, and properties specify the type of their values. Since we've specified that the outermost
object is /music/artist, Metaweb knows that the middle object is a /music/album (because that
is the type of the /music/artist/album property) and that the inner object is a /music/track
(because that is the type of the /music/album/track property). 5
We've answered our question about the song Too Much Information with a track-centric query
and an artist-centric query. For completeness, here is the album-centric query that returns the
same information:
[{
"type":"/music/album",
"name":null,
"artist":null,
"track": [{
"name":"Too Much Information",
"length": null
}]
}]
The critical thing about id is that it is unique: every object's id is different. For objects, such as
types, that are organized into namespaces, the id is a fully-qualified name such as "/music/artist".
For other objects, the id is a guid: a unique, but meaningless, string of hexadecimal digits. Note
that although ids are represented with JSON strings, the id property of /type/object is of
/type/id rather than /type/text or /type/rawstring.
In addition to its guarantees of uniqueness, the id property has some special behavior. Specifically,
the id property cannot be constrained with pattern-matching or comparison operators, and cannot
be used as a sort key. (We'll learn about operators and sorting later in this tutorial.)
The special thing about the name property is that it behaves like a unique property (you can safely
query it with null instead of [], for example) but it is not truly unique. Any Metaweb object can
have multiple names, but may have only one name in any given language. That is, the name
property is unique on a per-language basis. When you query the name of an object, Metaweb
5
Yes, properties have fully-qualified names that include the name of the type of which they are a part. We'll see example queries using
fully-qualified names later in this tutorial.
To demonstrate the special behavior of the name property, we must choose a topic that has
translations into other languages. Let's find the freebase.com topic named "Anarchism":
Query Result
{ {
"type" : "/common/topic", "type": "/common/topic",
"name" : "Anarchism", "name": "Anarchism",
"id":null "id": "#1f8000000000003b60"
} }
Now, let's take this object identified by id, and ask for its name:
Query Result
{ {
"id":"#1f8000000000003b60", "id":"#1f8000000000003b60",
"name":null "name":"Anarchism"
} }
This simply returns the English name we started with: "Anarchism". Let's ask for all names:
Query Result
{ {
"id":"#1f8000000000003b60", "id":"#1f8000000000003b60",
"name":[] "name":["Anarchism"]
} }
This query just returns the unique English name in an array. So let's try again and ask for all
names, along with the languages in which they are encoded:
Bingo! We find that this object has names in English (en), Spanish (es), French (fr), Italian (it),
and German (de)
Here's how we can ask for a name of the object in a specific language other than our preferred
language:
Query Result
{ {
"id":"#1f8000000000003b60", "id" : "#1f8000000000003b60",
"name":{ "name": {
"value":null, "value": "Anarchisme",
"lang":"/lang/fr" "lang": "/lang/fr"
} }
} }
{
"type":"/music/album",
"name":"Synchronicity",
"artist":"The Police",
"track":[{"name":null, "length":null}]
}
Metaweb also allows us to ask "What are the names and lengths of the long songs on the album?"
The query below includes a numeric constraint on the length property, and the freebase.com
response only includes the two songs on the album that are longer than 300 seconds:
The line "length>":300 in the query expresses a constraint to Metaweb: it specifies that the track
must be longer than 300 seconds. In addition to >, you can also use < for less-than, and <= and
>= for less-than-or-equal and greater-than-or-equal. Note, however, that no spaces are allowed
before or after these punctuation characters.
This constraint syntax looks quite odd at first. It is a result of the limitations of the JSON format:
everything must be expressed with property names, colons, and values. We would like to be able
to express a constraint like:
But that is not legal JSON syntax, so we express it instead like this:
"length<=" : 300
You can include more than one numeric constraint on the same property, restricting the value to
a range. Here's how we ask for songs that are at least three minutes long, but less than four:
{
"type":"/music/album",
"name":"Synchronicity",
"artist":"The Police",
"track":[{
"name":null,
"length":null,
"length>=":180,
"length<":240
}]
}
If you include a constraint on a property, you must also ask Metaweb to return the value of that
property. You cannot, for example ask: "List all songs longer than 5 minutes, but don't bother to
tell me exactly how long they are."
[{
"type":"/music/album",
"name":null,
"artist":null,
"release_date":null,
"release_date>=":"1999-01-01",
"release_date<=":"1999-12-31"
}]
[{
"type":"/music/track",
"artist":null,
"name":null,
"name~=":"love",
"length":null,
"length<":120
}]
Here's a query for songs about love recorded by bands whose name begins with "The":
[{
"type":"/music/track",
"artist":null,
"artist~=":"^The",
"name":null,
"name~=":"love"
}]
Results include If You Love Somebody, Set Them Free by The Police and I'm Sick of Love by The
White Stripes.
Notice that the constraint on the artist property in the query above uses the ^ character to specify
that the word The must appear at the beginning of the artist's name. If you're familiar with regular
expressions, this might make you think that Metaweb supports pattern matching with regular
expressions. In fact, Metaweb's matching syntax is closer to that used by internet search engines.
Table 3.2 summarizes MQL pattern matching syntax. Note that all searches are case-insensitive.
6
If you've done programming with languages like Perl or Ruby, this syntax should look familiar. If you're not already familiar with
it, think of "~=" as meaning "approximately equal" or "like".
Here's a query to find all bands whose name is two words long and begins with the word The
(such as The Police, and The Clash):
[{
"type" : "/music/artist",
"name" : null,
"name~=" : "^The *$"
}]
What bands have three-word names that begin with "the" and end with a plural (e.g. The Beach
Boys, The Doobie Brothers)?
[{
"type" : "/music/artist",
"name" : null,
"name~=" : "^The * *s$"
}]
In addition to matching text with ~=, string constraints can also be applied with the <, >, <=, and
>= operators, which compare strings in case-insensitive, Unicode-aware alphabetical order. For
example, to find bands whose name begins with one of the letters A through F, use this query:
Note that it is not legal to constrain the id property, with either the pattern-matching operator or
the greater-than or less-than operators.
[{
"type":"/music/artist",
"name":null,
"limit":2000
}]
limit is not a property name: it is a reserved word in MQL. No type may have a property named
"limit". Limits can be useful to prune the result tree of values you aren't really interested in. The
following query, for example, asks "What bands have names that begin with "The" and have re-
corded songs longer than 8 minutes? I'm only interested in the band name, so just give me one
of the long songs, not the full list."
[{
"type" : "/music/artist",
"name" : null,
"name~=" : "^The",
"track": [{
"name":null,
"length":null,
"length>":480,
"limit":1
}]
}]
Note that we use a limit of one in the above. Specifying a limit of zero means "don't limit the
results: return everything you've got". Although MQL allows you to ask for an unlimited number
of results, Metaweb does not guarantee that you'll always get an answer. Complicated queries
with a large number of results may time out before Metaweb can complete the result.
{
"type" : "/music/artist",
"name" : "The Police",
"album" : []
}
If we want to limit the result to five albums, we must rewrite the query as follows:
{
"type" : "/music/artist",
"name" : "The Police",
"album" : [{"name":null, "limit":5}]
}
As you can see, the sort directive simply specifies the name of the property by which the sort
is to be done. To order these same tracks from shortest to longest, use "length" as the sort key:
To reverse this order, precede the name of the sort key by a minus sign:
The sorts shown above are convenient, but could easily be duplicated on the client side. That is,
you could request unordered results from Metaweb and sort them yourself. One situation in which
the sort directive cannot be duplicated on the client is when it interacts with the limit directive.
Result sets are truncated to the specified limit after the sort is applied. Use sort and limit together
in queries like this:
(Note that explicitly specifying a limit of 1 means that we can safely omit the square brackets
from the query.)
Sorting need not be limited to a single sort key. To specify more than one key, use an array on
the right-hand side of the sort directive:
// List all tracks by The Police, sorted by album name and track name
[{
"type":"/music/track",
"artist":"The Police",
"name":null,
"album":null,
"sort":["album","name"]
}]
// List all albums by The Police, along with the name of their longest track.
// Order the albums from longest longest track to shortest longest track.
[{
"type":"/music/album",
"artist":"The Police",
"name":null,
"track":{
"name":null,
"length":null,
"sort":"-length",
"limit":1
},
"sort":"-track.length"
}]
Some data, such as the tracks on an album, have a natural order. If you want results to be sorted
according to this natural ordering, use "sort":"index". (Or, to reverse the natural ordering, use
"sort":"-index".
7
Metaweb's ordered collections are sometimes described as lists, but this term is inaccurate because lists are allowed to have duplicate
elements. Metaweb's ordered collections are still fundamentally sets, and duplicates are not allowed.
Since we've used "index" as a sort key, we must query the value of "index" as well. index is a
keyword in MQL and is not a true property of any object. It can be queried, however, and when
you do this, Metaweb returns a non-negative integer. It is important to understand that the notion
of order does not apply to objects in Metaweb, but to the relationships between objects. It is the
link between the album "Synchronicity" and the track "Mother" that has an index of 3, not the
track itself. This becomes clear when you consider the case of a track that appears on more than
one album. If "Mother" also appears on an album named "Greatest Hits" it is likely to have a
different index on that album. 8
Since index is not a true property, there are a lot of things you cannot do with it. You cannot
constrain the index with property names index> or index<. MQL read queries may use index as
a sort key, and they may query the index with "index":null, but may not use the keyword in
any other way. You cannot write "index":1 to ask for the second item in a set, for example. (The
index keyword can be used in other ways in write queries, however, and we'll learn about that
in Chapter 5).
The index keyword can be used in conjunction with the limit directive. Consider the following
query, which ask for the last two tracks on Synchronicity:
Query Result
{ {
"type":"/music/album", "type": "/music/album",
"artist":"The Police", "artist": "The Police",
"name":"Synchronicity", "name": "Synchronicity",
"track":[{ "track": [{
"name":null, "index": 1,
"index":null, "name": "Murder by Numbers"
"sort":"-index", },{
"limit":2 "index": 0,
}] "name": "Tea in the Sahara"
} }]
}
8
In Metaweb's schema tracks only appear on a single album: if multiple albums have a track by the same name, each one is a unique
object. So the example given here could not actually happen.
The number that Metaweb returns as the value of the index property is a synthetic one, generated
by Metaweb as a simple way to express the order of elements. If Metaweb returns an array
holding n elements, then it generates index values for those elements that range from 0 to n-1.
For example, if you ask for the last two tracks on an album, the resulting values have indexes 0
and 1. If you ask for tracks that are shorter than 2 minutes and Metaweb finds three of them, then
it will assign them index values of 0, 1, and 2. If you want to know the track number for the tracks
on a particular album, you must query the complete set of tracks. Then add one to the index value
to get the track number. If you want to know the track numbers of the short songs, you must
query the complete set of tracks, and search for the short songs yourself.
[{
"type" : "/music/artist",
"name" : null,
"name~=" : "^The",
"album":[{
"name":null,
"name~=": "greatest hits",
"optional":true
}]
}]
Without the optional directive, the query would only return bands whose name begins with The
who have released a Greatest Hits album. With the optional directive, we get all bands whose
name begins with The, and additionally, we get the name of any albums they have released that
include the phrase "greatest hits".
Optional queries can be nested inside optional queries. The following query is an extension to
the one above. It further asks for the names of tracks longer than 5 minutes, if any exist, on the
Greatest Hits album, if it exists.
[{
"type" : "/music/artist",
"name" : null,
Note that it is legal, but never necessary or useful, to add "optional":false to a query. Also, it
is never useful to use the optional directive in the top-level of a query. Queries are implicitly
optional at that level: if Metaweb can't find a match, it returns an empty result.
Query Result
{ {
"id":"#1f800000000006df1b", "id":"#1f800000000006df1b",
"name":null, "name":"The Police",
"type":[] "type":[
} "/common/topic",
"/music/artist"
]
}
What do you do if you want to query one property, such as a list of albums from one type, and
another property, such as a list of images, from a second type? MQL addresses this issue by al-
lowing you to specify a fully-qualified property name that includes the name of the type to which
it belongs. So here is how we ask for the albums by and pictures of, The Police:
{
"type":"/music/artist",
"name":"The Police",
"album":[],
"/common/topic/image":[{}]
}
The first line of this query specifies that the object to be matched should be of type /music/artist.
The second line specifies the name of the object. name and type are properties of /type/object,
and are shared by all objects in the database. These property names (along with id, guid, key,
The third line of the query asks for a property named album. This property is not defined by
/type/object, but it is defined by /music/artist, and the query has already declared that the
object will be an instance of that type. The fourth line asks for a property named image. This is
not defined by /type/object nor by /music/artist, and so we must qualify it with the name
of its type so that Metaweb can understand it.
For symmetry, and to be explicit, you can rewrite the query to fully-qualify both properties of
interest:
{
"type":"/music/artist",
"name":"The Police",
"/music/artist/album":[],
"/common/topic/image":[{}]
}
If you do this, you might be tempted to drop the initial type specification, since the album property
is now fully-qualified:
[{
"name":"The Police",
"/music/artist/album":[],
"/common/topic/image":[]
}]
Notice that we've put the toplevel query in square brackets now. This query will return any object
whose name is The Police, even if it has no album or image properties, and even if it is an instance
of neither /music/artist nor /common/topic.
Note that qualified property names use / as a delimiter and nested sort keys use . as a delimiter.
If your query uses qualified property names and sorts by those names, you may end up using
both delimiter characters. The following query is a variation on one shown earlier, in which two
of the properties have been (unnecessarily) qualified. Note the lengthy sort key:
// Police songs from albums released before 1990, sorted by album name
[{
"type":"/music/track",
"artist":"The Police",
"name":null,
"/music/track/album":{
"/type/object/name":null,
"release_date":null,
"release_date<":"1990"
},
"sort":"/music/track/album./type/object/name"
}]
Query Result
{ {
"id":"#1f8000000002f9e349", "id":"#1f8000000002f9e349",
"*":null "guid":"#1f8000000002f9e349",
} "name": "Synchronicity",
"type": ["/music/album", "/common/topic"],
"key": [
"1299f319-8ff4-44fb-8440-7fb990972864",
"RELEASE3178"
],
"creator": "/user/mwcl_musicbrainz",
"permission": "/boot/all_permission",
"timestamp": "2006-11-30T13:42:18.0194Z"
}
This query identifies a unique object by ID, and then uses a wildcard to ask for all of its properties.
Since no type has been specified, the wildcard is expanded with all the properties of /type/object,
and the result is as shown above.
Note that some of the properties expand to a single value, and others to arrays. Thus the syntax
"*":null really means "*":null-or-[]. We could instead write the query using "*":[]. In this
case, all of the property are returned as arrays, even unique properties.
Now let's modify the query to specify a type other than the default of /type/object:
{
"type":"/music/album",
"name":"Synchronicity",
"*":null
}
In this query, the * wildcard expands differently. Since we have specified that the object is of
type /music/album, Metaweb looks up the properties of that type and queries each one with a
null or [], depending on whether the property is unique or not. It does this in addition to also
querying the common object properties shown in the query result above.
Note that if a property is explicitly listed in a query, a wildcard expansion will not overwrite it.
Consider this:
{
"type":"/music/album",
"name":"Synchronicity",
"track":[{}],
"*":null
}
Wildcards can also be used in a second, more aggressive, form. "*":{} expands to query each
property with {} or [{}] instead of null or []. Similarly, "*":[{}] expands to query each
property, even unique properties, with [{}]. Let's repeat the query with which we began this
section, using "*":{} instead. With this query, each of the properties of /type/object is expanded
into a complete object, and the result is much longer. The long response is reproduced here in its
entirety because it serves as a useful review of the structure of some of the most fundamental
Metaweb data types:
Query Result
{ {
"id":"#1f8000000002f9e349", "id": "#1f8000000002f9e349",
"*":{} "guid": {
} "type": "/type/id",
"value": "#1f8000000002f9e349",
},
"name": {
"lang": "/lang/en",
"type": "/type/text",
"value": "Synchronicity"
},
"type": [{
"type": ["/type/type"],
"id": "/music/album",
"name": "Record album"
},{
"type": ["/type/type"],
"id": "/common/topic",
"name": "Topic"
}],
"key": [{
"type": "/type/key",
"namespace":
"/user/metaweb/datasource/MusicBrainz",
"value":
"1299f319-8ff4-44fb-8440-7fb990972864"
}, {
"type": "/type/key",
"namespace":
"/user/metaweb/datasource/MusicBrainz/name",
"value": "RELEASE3178"
}],
"creator": {
"type": ["/type/user"],
"id": "/user/mwcl_musicbrainz",
"name": null
},
"permission": {
"type": ["/type/permission"],
[{
"type":"/music/artist",
"name":null,
"name~=":"^The",
"album":"Greatest Hits"
}]
This query says: tell me the names of objects which have type "/music/artist" AND which have
a name that begins with "The" AND which have an album named "Greatest Hits".
Suppose we want to find the names of all bands who have an album named "Greatest Hits" AND
an album named "Super Hits". We might try this query:
[{
"type":"/music/artist",
"name":null,
"album":["Greatest Hits","Super Hits"] // Invalid MQL
}]
But this is not legal MQL. And if it was, it would probably mean find an artist who has recorded
exactly two albums, with names "Greatest Hits" and "Super Hits". A musical artist object may
have multiple album links to album objects. We want to constrain our query so that all result objects
have links to two specific album names. Here's a natural way to express this query:
[{
"type":"/music/artist",
"name":null,
"album":"Greatest Hits",
"album":"Super Hits" // Invalid JSON
}]
This query makes sense in the Metaweb object model: find objects that have one "album" link
to an album named "Greatest Hits" and another "album" link to an album named "Super Hits".
Unfortunately, this query is not valid JSON: since it includes the same property name twice, it
cannot be parsed into object form.
Query Result
[{ [{
"type":"/music/artist", "type": "/music/artist",
"name":null, "name": "Alice Cooper",
"a:album":"Greatest Hits", "a:album": "Greatest Hits",
"b:album":"Super Hits", "b:album": "Super Hits"
"limit":2 },{
}] "type": "/music/artist",
"name": "Dan Fogelberg",
"a:album": "Greatest Hits",
"b:album": "Super Hits"
}]
Note that the arbitrary prefixes we choose for the query are repeated in the result objects. The
prefixes are arbitrary, but they must be valid identifiers which means they cannot contain punc-
tuation characters and must not begin with a digit.
This property prefixing scheme is not limited to sets of two properties. And prefixed properties
can include operator suffixes. Let's find bands that have lots of hits and have recorded Christmas
albums:
[{
"type":"/music/artist",
"name":null,
"a:album":"Greatest Hits",
"b:album":"Super Hits",
"c:album~=":"christmas",
"c:album":[]
}]
Another use of property prefixes is to constrain a property and also query the property at the same
time. Let's find bands that have released a Greatest Hits album, and also ask for the names of all
the albums they have released:
[{
"type":"/music/artist",
"name":null,
"album":[],
"includes:album":"Greatest Hits",
}]
Note that although property prefixes are arbitrary, we can choose identifiers that add meaning to
our queries.
At the beginning of this tutorial, we wrote a query to determine the types of the object that rep-
resents The Police. In order to do this, we first asked for the id of The Police, and then used the
Query Result
{ {
"constraint:type":"/music/artist", "constraint:type": "/music/artist",
"name":"The Police", "name": "The Police",
"query:type":[] "query:type": [
} "/music/artist",
"/common/topic"
],
}
As an interesting aside, let's return to the query with which we started this section. We want to
find bands that have released "Greatest Hits" and "Super Hits" albums. There is actually a way
to do this without property prefixes. It relies on the fact that Metaweb relationships are always
bi-directional and that MQL queries can be "turned inside out":
[{
"type":"/music/artist",
"name":null,
"album":[{
"name":"Greatest Hits",
"artist":{
"album":"Super Hits"
}
}]
}]
Translated into English, this query says: "give me the names of all bands that have released an
album named "Greatest Hits", the artist of which has released an album named "Super Hits". The
album property of a band object refers to an album object. And the artist property of the album
object refers back to the band object. We can use this fact to further constrain the artist. This
technique (some would say "hack") is worth understanding because it illustrates one of the deep
properties of Metaweb objects.
[{
"type":"/music/artist",
"name":null,
9
It is possible to send two distinct queries in a single envelope with a single HTTP request. We'll learn how to do this in Chapter 4.
[{
"type":"/music/artist",
"name":null,
"album":"Super Hits"
}]
Combining the results of two queries is fairly straightforward. The only tricky issue is avoiding
duplicates. If a band appears in the results of both queries, for example, you would want to take
care that it did not appear twice in the combined result.
Combining property prefixes with a pattern matching operator and the optional directive, we
can achieve something vaguely like an OR operation:
[{
"type":"/music/artist",
"name":null,
"album":[],
"album~=":"hits",
"great:album":[{
"name":"Greatest Hits",
"optional":true
}],
"superb:album":[{
"name":"Super Hits",
"optional":true
}]
}]
This query returns all bands that have any albums whose name includes the word "hits". It returns
the names of those albums, and includes optional sub-queries for the particular names we're in-
terested in.
The reason that this is not a general-purpose way to express OR in MQL is that it only works
when the strings in the array are object ids or guids. The meaning and use of the |= constraint
becomes much clearer with some examples. One straightforward use is to run the same query
over multiple objects that are specified by id. The following query asks for the properties of three
types:
This next example asks for the ids of GIF or PNG (but not JPEG) images:
[{
"type":"/common/image",
"id":null,
"/type/content/media_type":null,
"/type/content/media_type|=":[
"/media_type/image/gif",
"/media_type/image/png"
]
}]
Finally, here is an example that uses both the |= constraint to express an OR and uses a property
prefix to express AND. It asks for the French and Spanish translations of the country name
"England":
Query Result
{ {
"type":"/location/country", "type":"/location/country",
"english:name": "England", "english:name":"England",
"foreign:name": [{ "foreign:name":[{
"value":null, "value":"Angleterre",
"lang":null, "lang":"/lang/fr"
"lang|=":["/lang/fr","/lang/es"] },{
}] "value":"Inglaterra",
} "lang":"/lang/es"
}]
}
As another example, suppose you wanted to know what bands had an album named "Greatest
Hits", but wanted to exclude all country music. You could do one query for Greatest Hits albums,
and then do another for all country music bands (using the /music/artist/genre property), and
then subtract the second result from the first. This is not particularly efficient, since there are
probably a whole lot of country music bands. Better would be a single query for albums named
Query Result
{ {
"type":"/type/type", "type": "/type/type",
"id":"/type/object", "id": "/type/object",
"properties":[] "properties": [
} "/type/object/id",
"/type/object/guid",
"/type/object/type",
"/type/object/name",
"/type/object/key",
"/type/object/timestamp",
"/type/object/permission",
"/type/object/creator"
]
}
Query Result
{ {
"type":"/type/property", "type": "/type/property",
"id":"/type/object/name", "id": "/type/object/name",
"*":null "guid": "#1f80000000000000ca",
} "name": "name",
"key": ["display_name", "name"],
"expected_type": "/type/text",
"unique": true,
"schema": "/type/object",
"master_property": null,
"reverse_property": [],
"creator": "/user/root",
"permission": "/boot/root_permission",
"timestamp": "2006-11-30T12:43:53.0081Z"
}
10
As a matter of convention, property names are usually singular, even when they are non-unique properties and multiple values are
expected. /type/type/properties is an exception to this rule.
Query Result
{ {
"id":"/music/album/track", "id":"/music/album/track",
"/type/property/expected_type":null "/type/property/expected_type":"/music/track"
} }
Note that we omitted a type specification from this query and instead simply used the fully-
qualified name of the expected_type property.
If you were planning to write a program that made many music-related queries, you might first
want to explore all of Freebase's music-related types. But where do you get a list? You query the
domain: 11
Query Result
{ {
"type":"/type/domain", "type": "/type/domain",
"id":"/music", "id": "/music",
"types":[] "types": [
} "/music/group_membership",
"/music/recording_contribution",
"/music/track",
"/music/album",
"/music/album_release_type",
"/music/artist",
"/music/genre",
"/music/group_member",
"/music/instrument",
"/music/performance_role",
"/music/record_label",
"/music/voice"
]
}
11
/type/domain/types is another plural property name.
Let's begin at the top level. A query is a comma-separated list of one or more pairs, enclosed
in curly braces, which are optionally enclosed in square braces:
A property begins with a property name in quotation marks followed by a colon and a property
value. The property value may be a nested query, a JSON literal value or an "empty" value such
as null or []. As a special case, the index query "index": null is also allowed.
empty:: null | [ ] | { } | [ { } ]
A simple-name is an identifier that is not a reserved word. A prefix is the same thing. A quali-
fied-name consists of a slash and one or more identifiers followed by slashes, all followed by a
simple-name. Finally, an identifier is a string of ASCII characters that begins with a letter and
consists of letters, numbers and underscores. Additionally, an identifier may not end with an
underscore or contain two underscores in a row. Reserved words include sort, limit, optional
and index, and also include directives used by the MQL write grammar (see Chapter 5) and a
number of other identifiers that are reserved for possible use in the future:
property-name:: simple-name |
qualified-name |
prefix . : . simple-name |
prefix . : . qualified-name
simple-name:: identifier <but not reserved-word>
The remaining kinds of pair that can appear in a query are wildcard, comparison and directive.
A wildcard is an asterisk in quotes (to indicate a wildcard property name) followed by a colon
and an empty query (a null, [], {} or [{}]):
A comparison is quoted name followed by a colon and a value, where the name includes an op-
erator and the value is a string or a number:
The sort directive syntax requires further explanation. A sort directive is the keyword "sort"
followed by a colon and a sort key or an array of sort keys. A sort key is the keyword "index"
or one or more property names separated with . characters:
The MQL grammar for updating Metaweb has some additional rules. We'll learn about those in
Chapter 5.
http://www.freebase.com/api/service/mqlread
To submit a query to the mqlread service, place the query inside an "envelope" object. Next, use
JSON to serialize the query object. Then URL encode the JSON string and prefix it with ?quer-
ies=. Finally, append the whole thing to the URL above, and retrieve the content of the resulting
URL with an HTTP GET request.
While freebase.com is being rolled out and scaled up, the mqlread service at that site requires
HTTP cookie headers for authentication. If you are an early user at freebase.com, this issue
will affect you. Some of the examples in this chapter demonstrate how to correctly log in
to freebase.com , obtain credentials, and submit them to the mqlread service using HTTP
cookies. Others, like the one that follows, simply ask you to hardcode your cookie data
into the script.
To find the authentication data you need, first visit the freebase.com home page and make
sure you are logged in. This will ensure that your browser has stored the necessary cookies.
Obtaining cookie data is a browser-specific task. If you are using Firefox, pull down the
Edit menu and select Preferences. Click the Show Cookies... button in the dialog that
appears. This will open a second dialog. Enter "freebase.com" into the Search box, and
then scroll through the resulting list of cookies until you find the one named metaweb-user.
Highlight this entry in the list to view the content of the cookie. It will probably something
like this:
A|u_docs|g_#1f8000000001209013|4.xwItasvXuoVOQiXg3Sm04b
This is the string you'll need to enter into Example 4.1 and Example 4.6. (You have to use
your own, though: the cookie shown here is not actually valid.)
63
Example 4.1 is a command-line utility that list the albums released by any band you specify. It
uses the Metaweb API to retrieve data from freebase.com. It is written in Perl, and demonstrates
how to nest an MQL query within an envelope and send that envelope to to the mqlread service.
(The structure of the envelope object will be explained in Section 4.2.3.) It sends hard-coded
authentication credentials to mqlread using HTTP cookies. Until freebase.com has fully opened
its services to the world, you'll have to insert your own cookie data into this script to make it
work.
#!/usr/bin/perl
use URI::Escape; # This module provides the uri_escape function used below
# Now place the query in JSON envelope objects, and URL encode the envelopes
$envelope = '{"qname":{"query":' . $query . '}}';
$escaped = uri_escape($envelope);
# Use the command-line utility curl to supply the cookies and fetch the
# content of the URL.
$result = `curl -s --cookie \'$auth\' $url`;
# Use regular expressions to extract the album list from the HTTP response
$result =~ s/^.*"album"\s*:\s*\[\s*([^\]]*)\].*$/$1/s;
$result =~ s/[ \t]*"[ \t,]*//g;
Example 4.2 is a higher-level version of Example 4.1. It uses a JSON serializer and parser and
also a higher-level API for URL manipulation. To use it, you must have the JSON.pm module
(which you can find at http://search.cpan.org) installed. This version of the program also uses a
somewhat more sophisticated query to sort albums by their release date, and also does error
checking and error reporting in case anything goes wrong with the query. Finally, this version
of the program demonstrates how to log in to Metaweb to obtain authentication credentials. Instead
of hardcoding your cookie data into the script, you must instead hardcode your Freebase username
and password. Although this example uses the Metaweb login service, that service is not formally
documented until Chapter 6.
Note that Example 4.2 places the query in the inner and outer envelope objects before JSON
serialization. Note also that the mqlread service returns the query results in its own two-layer
response envelope object. The outer object includes a property with the same name as the outer
object of the query envelope. The inner object of the response has a property named result. The
value of the result property is the result of the MQL query.
#!/usr/bin/perl -w
use strict; # Don't allow sloppy syntax
use JSON; # JSON encoding and decoding
use URI::Escape; # URI encoding
use LWP::UserAgent; # High-level HTTP API
# Create the HTTP "user agent" we'll use to send the query
my $ua = LWP::UserAgent->new;
# Convert the envelope object from Perl hash to JSON string, and URI encode it
my $json = JSON->new(); # Create JSON parser/serializer
my $encoded = $json->objToJson($envelope); # Serialize object to string
my $escaped = uri_escape($encoded); # URI encode the string
http://www.freebase.com/api/service/mqlread
The sub-sections that follow document mqlread input and output, and also specify the format of
the query and response envelopes.
queries The value of this required parameter is an JSON-encoded and URI-encoded "en-
velope" object that holds the query or queries to be executed. The format of the
envelope is described in Section 4.2.3.
callback The optional callback parameter allows you to submit a request to mqlread via
a <script> tag. It affects the behavior of mqlread in the following ways:
Suppose that we have two MQL queries that we want mqlread to execute. We write the first
query on a piece of paper, fold it up and place it in an envelope. (This is the "inner query envelope
object"). We name this query "q0" and write those letters on the envelope. Next we write the
second query on another piece of paper. We put that paper in another envelope, and write "q1"
on that envelope. Finally, we place both envelopes within a cardboard box (the outer query en-
velope object) and mail the box off to http://www.freebase.com/api/service/mqlread.
The mqlread service opens the box and opens the envelopes it contains. It executes the two
queries and writes the results on two pieces of paper. It places the results of the first query in an
envelope (the inner response envelope) and writes "q0" on that envelope. It does the same for
the results of the second query, and writes "q1" on that envelope. Then it puts the two envelopes
in a box (the outer response envelope) and mails the box back to us.
The envelopes and cardboard boxes in our postal metaphor are just JSON objects, of course.
Here's the two-layer query envelope described above looks like in JSON:
The response envelope has the same structure as the query envelope. In JSON it might look like
this:
The outer and inner query and response envelopes are described more formally below:
The inner query envelope may include other properties that provide additional information about
how the query should be executed. At the time of this writing the only such property is cursor
which is documented in Section 4.7.
If, on the other hand, something was wrong with the query, then the status property will be
"/mql/status/error", and the inner response envelope will have a messages property whose
value is a JSON array of message objects that provide details about the error or errors. mqlread
error messages are documented in Section 4.6.
• takes a MQL query (as a Python data structure, not as JSON-serialized text) as its argument;
• sets the URL queries parameter to the serialized and encoded query
• if authentication credentials are passed to the function, it uses them in a Cookie header of the
HTTP request.
• obtains the query result, in text form, by fetching the contents of the URL;
• parses the JSON string returned by mqlread into a Python data structure;
• checks the status property in the inner response envelope to determine if the query was suc-
cessful (If the query fails, it extracts the error message from the inner envelope and raises an
exception.)
• gets the query result from the inner envelope and returns it as a Python data structure
This code relies on the simplejson module for JSON encoding and parsing. You can find the
simplejson code at http://cheeseshop.python.org/pypi/simplejson.
# Submit the MQL query q and return the result as a Python object.
# If authentication credentials are supplied, use them in a cookie.
# Raises MQLError if the query was invalid. Raises urllib2.HTTPError if
# mqlread returns an HTTP status code other than 200 (which should not happen).
def read(q, credentials=None):
# Put the query in an envelope
env = {'qname':{'query':q}}
# JSON serialize and URL encode the envelope and the query parameter
# Now upen the URL and and parse its JSON content
f = urllib2.urlopen(req) # Open the URL
response = simplejson.load(f) # Parse JSON response to an object
inner = response['qname'] # Open outer envelope; get inner envelope
# If anything was wrong with the invocation, mqlread will return an HTTP
# error, and the code above with raise urllib2.HTTPError.
# If anything was wrong with the query, we won't get an HTTP error, but
# will get an error status code in the response envelope. In this case
# we raise our own MQLError exception.
if inner['status'] != '/mql/status/ok':
error = inner['messages'][0]
raise MQLError('%s: %s' % (error['status'], error['message']))
# If there was no error, then just return the result from the envelope
return inner['result'];
With the metaweb.read() function defined, we can now write our album listing code in Python.
Example 4.4 shows how we do this. Note that this example uses the metaweb.login() function
for authentication. The implementation of this function is in Chapter 6.
import sys
import metaweb # Defines the metaweb.read() and login() functions
# Submit the query using metaweb.read() and check for valid results
result = metaweb.read(query, credentials)
if not result or not result['album']: sys.exit('Unknown band')
The code in Example 4.5 is commented and you should be able to follow it even if you are not
familiar with PHP. One point to note is that in PHP the data structure known as an array works
as both a sequential array and as an associative array. That is, JSON objects and JSON arrays
are both arrays in PHP. Example 4.5 depends on an external module for JSON serialization and
parsing. The module used here is from http://pear.php.net.
<?php
/*
* The Metaweb class defines a read() method for invoking the Metaweb
* mqlread service on freebase.com. read() takes a MQL query (as a PHP
* array) and freebase.com authentication credentials. It sends
* that query to the mqlread service and retrieves the response. It parses
* the response to a PHP array, and extracts the query result from the
* response envelopes and returns it. If the query fails, it returns
* null (without providing useful diagnostics).
*/
require "JSON.php"; // A JSON encoder/decoder from http://pear.php.net
class Metaweb {
var $json; // Holds the JSON encoder/decoder object
var $URL = "http://www.freebase.com/api/service/mqlread";
// Use the curl library to send the query and get response text
$request = curl_init($url);
// Parse the server's response from JSON text into a PHP array
$response = $this->json->decode($responsetext);
// Otherwise, open the envelope and just return the actual result object
return $response["qname"]["result"];
}
}
?>
With the PHP utility function defined in Example 4.5, it becomes easy to write simple Metaweb-
enabled web applications in PHP. Example 4.6 demonstrates. It displays an HTML form in which
Like Example 4.1, this example requires you to hard-code the value of your freebase.com authen-
tication cookie into the script. See the instructions earlier in this chapter for finding the value of
the metaweb-user cookie.
<html>
<body>
<form>Band: <input type="text" name="band"><input type="submit"></form>
<?php
$band = $_GET["band"]; // What band is specified in the URL?
if ($band) { // Only list albums if a band has been specified
require "metaweb.php"; // Import Metaweb utility code
$metaweb = new Metaweb(); // Create a Metaweb object
// Insert your own freebase.com cookie data into the string below
$credentials = 'metaweb-user=### Put your cookie data here ### ';
There are two workarounds to this restriction. The first, and most obvious, is to run a proxy script
on your own site that behaves like the mqlread service but simply forwards your query to free-
base.com.
The second workaround relies on the fact that a query result, in JSON format, is valid JavaScript
code. This means that a mqlread URL can be used as the value of the src attribute of a <script>
tag. When the server returns its result, the <script> tag evaluates the JSON text as JavaScript
code. Evaluating the JSON text creates the JavaScript object we want, but to make this scheme
work, the script then has to be able to do something with that object. The solution is to add another
URL parameter to the mqlread service. If the URL for your query includes a callback= parameter,
then mqlread will take the value of that parameter to be the name of a JavaScript function. Then,
instead of simply returning a JSON text, it will return the specified function name, an open par-
enthesis, the JSON text and a close parenthesis. When used this way with a <script> tag, the
JSON text is evaluated, and the object that results is passed to the specified function (which you
have defined previously).
We use the <script> technique in this chapter: it is simple, elegant and in common use across
the internet. If you prefer a proxy and XMLHttpRequest-based approach, you can find sample
proxy code in Appendix A.
One thing you'll notice about our JavaScript examples here (and in in Appendix A) is that they
are asynchronous: when you submit a query you do not get the result immediately. Instead, the
callback function you specify is invoked when the result is available. This asynchronous program-
ming model is common in client-side JavaScript, but is substantially different from the synchronous
model demonstrated in Example 4.5 and other examples.
The JavaScript-based web applications shown in this chapter do not attempt to automatically
obtain authentication credentials. They simply assume that you have visited and logged in
to www.freebase.com before you attempt to run them. If you do this, then your browser
will have the authentication cookies it needs, and it will automatically pass those cookies
to mqlread when <script> tags invoke it.
Example 4.7 lists the albums by a specified band, and also displays the tracks on an album when
the user clicks on the name of the album. There are several features worth noting in this example.
First, notice that this code uses the Metaweb.read() function to send queries to the mqlread service.
We'll see how Metaweb.read() is implemented in the next section. Second, note that the code
displays a "Loading..." message to the user while the queries are pending and also displays an
appropriate message if a query fails. Finally, Example 4.7 demonstrates two different ways to
insert Metaweb query results into an HTML document. When the album query returns, the album
list is populated using DOM methods to build each text node and <div> tag. When the track query
returns, on the other hand, the list of tracks is built as a string of HTML text and is inserted into
the document by setting the innerHTML property of the container element.
In addition to querying album and track names, the queries in this example also ask for album
release date and track length. The example includes utility functions to massage this data, extracting
a year from a /type/datetime string and converting a track length in seconds into the more fa-
miliar mm:ss format.
<html>
<head>
<script src="json.js"></script> <!-- JSON utilities -->
// Issue the query and invoke the function below when it is done
Metaweb.read(query, displayAlbums);
// This function is invoked when we get the result of our MQL query
function displayAlbums(result) {
// If no result, the band was unknown.
if (!result || !result.album) {
albumlist.innerHTML = "<b><i>Unknown band: " + band + "</i></b>";
return;
}
// Issue the query, invoke the nested function when the response arrives
Metaweb.read(query, function(result) {
if (result && result.track) { // If result is defined
var tracks = result.track; // array of tracks
// Build an array of track names + lengths
var listitems = []
for(var i = 0; i < tracks.length; i++) {
var n = tracks[i].name + " (" +
toMinutesAndSeconds(tracks[i].length)+")";
listitems.push(n);
}
// Display the track list by setting innerHTML
tracklist.innerHTML = "<h2>" + albumname + "</h2>" +
"<ol><li>" + listitems.join("<li>") + "</ol>";
}
&callback=Metaweb._3
When the mqlread service is invoked with this callback parameter, it does not return the result
as a pure JSON object. Instead it returns JavaScript code. The code is simply a function invocation
of the function named by the parameter. The invocation includes a JSON object as the single ar-
gument to the function:
Since JSON is a subset of the JavaScript object and array literal syntax, any JSON object is a
valid function argument. By simply wrapping a function invocation around the JSON object,
we've converted the mqlread response into a form suitable for use with a <script> tag.
Note that Metaweb.read() uses JSON.serialize() to serialize the query object into JSON form.
This utility function is defined in Example A.1 in Appendix A. The corresponding JSON.parse()
function is not required, however, since the JavaScript interpreter that processes the <script>
tag serves as our JSON parser.
/**
* metaweb.js:
*
* This file implements a Metaweb.read() utility function using a <script>
* tag to generate the HTTP request and the URL callback parameter to
* route the response to a specified JavaScript function.
**/
var Metaweb = {}; // Define our namespace
Metaweb.HOST = "http://www.freebase.com"; // The Metaweb server
Metaweb.QUERY_SERVICE = "/api/service/mqlread"; // The service on that server
Metaweb.counter = 0; // For unique function names
// Build the URL using encoded query text and the callback name
var url = Metaweb.HOST + Metaweb.QUERY_SERVICE +
"?queries=" + querytext + "&callback=Metaweb." + callbackName
// Create a script tag, set its src attribute and add it to the document
// This triggers the HTTP request and submits the query
var script = document.createElement("script");
script.src = url
document.body.appendChild(script);
};
You'll find a proxy-based implementation of this same Metaweb.read() function in Example A.2
in Appendix A.
{
"status": "400 Bad request",
"messages": [
{
"text": "JSON parse error: Expecting property name at line 1 column 1",
"type": "/service/error/invalid_value",
"field": "queries",
"value": "{<}\r\n",
"level": "error"
This kind of error can occur if you are cutting-and-pasting raw mqlread URLs or if you are entering
MQL queries as JSON text into a query editor application. When you write scripts that use JSON
serializers and invoke mqlread using tested code, this kind of error should not occur. Errors are
still possible, however: a MQL query can be invalid, even if it is expressed using well-formed
JSON and passed to mqlread using the correct URL parameters.
If you submit an invalid MQL query, mqlread returns an HTTP error code of "200 OK", but the
status property of the inner response envelope is "/mql/status/error". The inner response
envelope also includes a property named messages instead of a property named result. The
value of the messages property is an array (usually of length 1) of message objects each of which
has the following properties:
type A string that indicates what kind of message this is. For error messages, this is always
"/mql/error".
status An identifier that names the specific kind of error. /mql/status/parse_error and
/mql/status/type_error are typical values.
Note that the status property of a message object is distinct from, and more inform-
ative than, the status property of the inner response envelope.
info An object that provides additional details about the error. For type errors, for example,
the properties of this object specify the value and type that appeared in the query
and the type that was expected.
query A copy of the query object with the addition of a special error_inside property,
to indicate where error occurs. For parse errors, this property is omitted, since the
query couldn't be property parsed.
path A string that specifies the "path" of property names from the root of the MQL query
to the the location of the error. If the error is in the outermost object of the query,
then this property is just an empty string. For parse errors, this property is omitted.
Use a cursor when you want to retrieve results in batches from a large result set. Start by including
this property in your inner query envelope:
cursor: true
Example 4.9 is a metaweb.readall() function written in Python. It works like the metaweb.read()
function of Example 4.3, but uses a cursor to iterate through a large result set, making multiple
queries and concatenating the results into a single array before returning them. (Note that this
function doesn't allow any kind of parallelism: it does not allow the first batch of results to be
processed while the second batch is being fetched, for example. So if you're using a limit directive
and cursors to improve response time, this readall() methods is not appropriate.)
# Submit the MQL query q and return the result as a Python object
# This function behaves like read() above, but uses cursors so that
# it works even for very large result sets
def readall(q, credentials=None):
# This is the start of the mqlread URL.
# We just need to append the envelope to it
urlprefix = 'http://%s%s?queries=' % (host, readservice)
# The query and most of the envelope are constant. We just need to append
# the encoded cursor value and some closing braces to this prefix string
jsonq = simplejson.dumps(q);
envelopeprefix = urllib.quote_plus('{"q0":{"query":'+jsonq+',"cursor":')
# Finally, get the new value of the cursor for the next iteration
cursor = inner['cursor']
if cursor: # If it is not false, put it
cursor = '"' + cursor + '"' # in quotes as a JSON string
# Now that we're done with the loop, return the results array
return results
It is important to understand that cursors only work when multiple results are expected at the
top-level of the query. The cursor property is part of the mqlread envelope syntax, not part of
the MQL query language, and it cannot be applied to sub-queries of a query. Another way to say
this is that it only makes sense to include "cursor":true in an envelope if the first character
following "query": in the envelope is [. The query must be expressed as an array in order for a
cursor to be meaningful.
This is a perfectly valid query, and works just fine in Example 4.4. But suppose we wanted to
port that script to use the metaweb.readall() function defined above. To do this, we'd also have
to alter the query so that the array of albums was at the top level of the query:
The trans service is so named because in addition to fetching the requested data, it can also
translate it for you. For example, it can "translate" an image to thumbnail size.
The trans service is HTTP based, just as mqlread is. Content is retrieved by specifying the desired
translation and the content id, with a URL of this form:
http://www.freebase.com/api/trans/translation/guid
http://www.freebase.com/api/trans/raw/%239202a8c04000641f8000000003c1978c
raw Use raw to request that no translation is to be done on the data: it should be
returned as is. (Note, however that HTML content is not completely raw: it
is "sanitized" by stripping executable content such as JavaScript.)
blurb Use blurb to request an excerpt from the beginning of a document. This
provides a kind of a preview, of the kind you might see in a list of search
results.
The path component that follows the translation is the URL-encoded version of a Metaweb guid.
%23 is the encoding of the # character, and the letters and digits that follow are the hexadecimal
digits of the guid. The guid passed to trans must identify an object of type /type/content,
/common/image or /common/document. These three types are closely related:
/common/image When an image is added to the content store, the /type/content object
for the image is co-typed /common/image, in order to add a size
property that supplies the image dimensions. For images, therefore,
the guid of the /type/content and /common/image objects are the
same.
Given the guid of a document object, the trans service returns the
content of both Wikipedia and non-Wikipedia documents. For non-
Wikipedia documents, you can use either the guid of the /common/doc-
ument object or of the /type/content object it refers to.
The trans service does not support a callback parameter as the mqlread service does, so you
cannot use it with <script> tags. If you implement a proxy on your own web server, then you
can invoke the trans service indirectly to retrieve content with XMLHttpRequest, however.
It is usually easier, however, to use the trans service with <img> and <iframe> tags. To retrieve
and display an image, simply use a trans URL as the src attribute of an <img> tag. And to retrieve
and display the HTML content of a document, use a trans URL as the src attribute of an <iframe>.
Like mqlread, the trans service requires cookie-based authentication during the freebase.com
roll-out period. The examples in this chapter assume that you are using the trans service in a web
browser that has visited and logged on to www.freebase.com.
When you display Metaweb content in an <iframe>, the origin of the framed content is
different from the origin of your web application. This means that the same-origin security-
rules apply. The user of your web application can see the framed Metaweb content, but
JavaScript code in your application cannot access this content.
Metaweb strips executable code from HTML before returning it to you, but if any JavaScript
code were somehow to make it past Metaweb's sanitizer, that code would also be subject
to the same-origin policy and would be unable to interact with your web application content.
In this way, using an <iframe> for content fetched with trans gives you an extra layer of
security.
Example 4.10. WhatsNew.html: fetching new images and documents from freebase.com
<head>
<script src="json.js"></script>
<script>
// These are a few important constants
var HOST = "http://www.freebase.com";
var READ = "/api/service/mqlread";
var RAW = "/api/trans/raw/";
var THUMB = "/api/trans/image_thumb/";
var BLURB = "/api/trans/blurb/";
/**
* Send the queries named in the outer envelope object to Metaweb,
* and pass the outer response envelope to the function f. This is a
* variant of the Metaweb.read() function that runs multiple queries.
*/
function sendQueries(queryEnvelope, f) {
// Define a unique function name
var callbackName = "_" + sendQueries.counter++
// Build the URL using encoded query text and the callback name
var url = HOST + READ + "?queries=" + queries +
"&callback=sendQueries." + callbackName
// Create a script tag, set its src attribute and add it to the document
// This triggers the HTTP request and submits the query
var script = document.createElement("script");
script.src = url
document.body.appendChild(script);
};
sendQueries.counter = 0; // Initialize the counter
// These are the queries we issue to find the n newest images and documents
var queries = {
images: {
query: [{
type:"/common/image", id:null, // Return image ids
timestamp:null, sort:"-timestamp", // Most recent first
limit:N, // Only N of them
"/type/content/media_type":null, // Check image type, too
"/type/content/media_type|=":[ // We only want images that are:
"/media_type/image/gif", // GIF or
"/media_type/image/png", // PNG or
"/media_type/image/jpeg" // JPEG
]
}]
},
docs: {
query: [{
type:"/common/document", id:null, // Return document ids
timestamp:null, sort:"-timestamp", // Most recent first
limit:N // Only N of them
}]
}
};
// When the document has loaded, send the queries above to freebase.com.
// Then call the function below with the results
window.onload = function() { sendQueries(queries, displayResults) }
1
If you want more, Section A.3 is a JavaScript-based example that demonstrates Metaweb-powered autocompletion for HTML text
fields.
This example is notable because it uses a more complicated query than the other queries in this
chapter. Example 4.11 uses the result data to generate a page of information about the specified
type. This example is also notable because its HTML output is more complex than previous ex-
amples. The code is well-commented, and if you've understood previous JavaScript examples,
you should not have trouble following this one.
<html>
<head>
<!-- These are the modules we need -->
<script language="javascript" src="json.js"></script>
<script language="javascript" src="metaweb.js"></script>
<script language="javascript">
// This is the query we need to get information about a type.
// Note that we have to fill in the type we're interested in
// before sending this query.
var query = {
type:"/type/type", // The type of our type is /type/type :-)
id:null, // The type we're asking about. Filled in below.
name:null, // What is the human-readable type name?
// Objects with documentation are co-typed /freebase/documented_object
// Here we ask for a short description of the type
"/freebase/documented_object/tip":null,
// Query the specified type. Call displayType() when the results arrive
function queryType(type) {
query.id = type; // Specify the type in the query above
Metaweb.read(query, // Issue the query
displayType); // Pass result object to displayType
}
// Output a link to a type. Use the type id as the link text, and
// make the type name available as a tooltip
function displayTypeLink(out, id, name) {
out.write('<a title="', name, '" onclick="queryType(\'', id, '\')">',
id, '</a>');
}
// This little DOMStream class writes HTML into the element we specify
function DOMStream(id) { // Constructor function
this.elt = document.getElementById(id);
this.buffer = [];
}
DOMStream.prototype.clear = function() { // Erase element content
this.elt.innerHTML = "";
};
DOMStream.prototype.write = function() { // Buffer up all arguments
this.buffer.push.apply(this.buffer, arguments);
};
DOMStream.prototype.flush = function() { // Output all text to the element
this.elt.innerHTML += this.buffer.join("");
this.buffer.length = 0;
};
</script>
<style>
/* Some CSS styles to make everything look good */
body {
font-family: Arial, Helvetica, sans-serif; /* We like sans-serif */
margin-left: .5in; /* Indent everything... */
}
h1, h2 { margin-left: -.25in; } /* ...except headings */
h2 { margin-bottom: 5px; margin-top:10px; }
/* Make tables look nice */
table { border-collapse: collapse; width: 95%;}
th { background-color: #aaa;}
td { background-color: #ddd; padding: 1px 5px 1px 5px; }
/* Our <a> tags don't have hrefs, so we need to style them ourselves */
The best way to follow this tutorial is to try out the queries as you read about them. While
you learn how to make MQL writes, please use the freebase.com sandbox server at ht-
tp://sandbox.freebase.com. This server is intended for experimentation. The sandbox hosts
a replica of freebase.com, and this replica is re-created approximately once a week. This
means that any writes you perform (or mistakes you make!) on the sandbox will not persist
longer than a week.
Anyone can read data from Freebase, but before you can execute write queries, you must
register for an account and login. If you already have an account at www.freebase.com, it
may already have been replicated on the sandbox server. If not, you can create a new account
for yourself on the sandbox. Follow the links from the http://sandbox.freebase.com/
homepage to register. (If you used an invitation code when you registered at www.free-
base.com, just reuse it if you need to register on the sandbox.freebase.com.)
Once you are logged on to the sandbox, you can execute MQL write queries using the the
Freebase query editor at http://sandbox.freebase.com/view/queryeditor/. Enter queries from
this tutorial, click the write button, and view the results. Just as with reads, you must place
your MQL write queries in an "envelope" object. So instead of entering queries as they are
written in this chapter, you must prefix them with {"query": and end them with an extra
closing }.
95
the tutorial at the same time. As you know, Metaweb types are defined by regular Metaweb objects
in the database. This means that types are created like any other objects, with MQL queries. De-
fining a type with raw MQL is difficult and error prone, however, so just about everyone defines
types using the freebase.com client.
The type we're creating will represent musical notes, and we'll call it "note". In order to create
it, follow the "My Freebase" link from the freebase.com home page. On the My Freebase page,
click on "Types Created", and enter the name "Note". Figure 5.1 illustrates.
That's all you need to do for now. We'll add some properties to this type later, but now we just
need the type itself. If you click on the name of the newly created type (note that the freebase.com
client capitalizes the name for you) and look at the URL that it takes you to, you'll see that the
name of the new type is /user/username/default_domain/note, where "username" is the user-
name you logged in with. In the tutorial that follows, you'll see the username docs, but you should
substitute your own name throughout.
Write Result
{ {
"create":"unless_exists", "create":"created",
"type": "type":
"/user/docs/default_domain/note", "/user/docs/default_domain/note",
"name":"A", "name":"A",
"id":null "id":"#1f8000000000037ffc"
} }
The first line of the query says that we want to create a new object, unless a matching object
already exists. The second line specifies the type of the object we're creating (remember to sub-
stitute your own user name for "docs" here). The third line specifies a value for the name property
of the new object. The fourth line of the write query is a request for the id of the newly created
object. Asking for an id is the only way you are allowed to use null in a write query. You may
not use null or [] for any other property.
Now let's look at the response to the write query. The first line is the create property, but its
value has changed from unless_exists to created. This tells us that the object we specified did
not already exist, and Metaweb has created it for us. The second and third lines simply repeat
the type and name properties that we passed in. They don't provide any new information, but
maintain the MQL invariant that responses have the same properties as queries. Finally, the fourth
line returns the id of the newly created object.
The guids used in this tutorial were created on the sandbox server, and are no longer valid,
so you should not try to query these objects directly. Instead, substitute your own user name
into the write queries, and create your own objects, with their own guids, as you follow
along with this tutorial. If you are reading a printed or PDF version of this chapter, note
that the guids have been shortened so that they fit more easily in two-column format.
Now let's see what happens if we run exactly the same query again:
Write Result
{ {
"create":"unless_exists", "create":"existed",
"type": "type":
"/user/docs/default_domain/note", "/user/docs/default_domain/note",
"name":"A", "name":"A",
"id":null "id":"#1f8000000000037ffc"
} }
Now let's force Metaweb to create another new test object for us:
Write Result
{ {
"create":"unconditional", "create":"created",
"type": "type":
"/user/docs/default_domain/note", "/user/docs/default_domain/note",
"name":"A", "name":"A",
"id":null "id":"#1f800000000003800f"
} }
In this query, we've changed the value of the create directive to unconditional. As its name
implies, this value tells Metaweb to create a new object no matter what. Since a new object is
created unconditionally, the value of the create property in the response will always be created.
You can see that a new object was created by comparing the id returned by this query to those
returned by the previous two queries.
We now have two note objects with the name "A". What happens if we run the original unless_ex-
ists write again?
{
"status": "400 Bad request",
"messages": [
{
"query": {
"create": "unless_exists",
"type": "/user/docs/default_domain/note",
"name": "A",
"error_inside": ".",
"id": null
},
"text": "Need a unique result to attach here, not 2",
"args": {
"count": 2,
"guids": [
"#1f8000000000037ffc",
"#1f800000000003800f"
The query fails this time, and returns the JSON object shown above. The "create":"unless_ex-
ists" directive works only if there are 0 or 1 instances of the object. If there is no object that
matches, it creates one. If there is one object that matches, it returns it. But if there are more than
one, it has no way to choose which one to return, and fails with an error message. Note that the
query fails even if we omit "id":null. The lesson here is that if you plan to use unless_exists,
you should use it consistently so you never end up with more than one instance of an object.
Write Result
{ {
"id":"#1f800000000003800f", "id":"#1f800000000003800f",
"name":{ "name":{
"connect":"update", "connect":"updated",
"value":"B", "value":"B",
"lang":"/lang/en" "lang":"/lang/en"
} }
} }
The first line of the query identifies, by id, the object we want to modify. The second and third
lines specify that want to update the name property of that object so that it refers to the /type/text
value specified by the 4th and 5th lines. (Recall that /type/text is a primitive value that consists
of a string of text and a language identifier for that text. MQL write queries require you to specify
both the value and lang properties when manipulating a name.)
The response looks just like the query except that the value of the connect property has changed
to updated. This tells us that the update we requested has been performed.
We're asking to make a change that has already been made, and Metaweb lets us know this by
setting the connect property of the response to present.
We now have two newly-created objects with the same type and different names. We changed
the name of the second object by updating a /type/text value. /type/text is a primitive type
in Metaweb, so this isn't quite the same thing as a link between two different objects in the data-
base. Now, let's modify the first object (the note A) so that it is a /common/topic in addition to
being a note:
Write Result
{ {
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"type":{ "type":{
"connect":"insert", "connect":"inserted",
"id":"/common/topic" "id":"/common/topic"
} }
} }
The first line of the query specifies the object to be modified. (Normally, we'd identify the object
by name and type, but we can't specify the type and add a type in the same query. The name "A"
is probably not unique by itself, so we specify the object we want to modify by id.) The second
and third lines specify that we want to insert a new connection between this object and another
object, and that this new connection should use the type property. The fourth line specifies, by
id, the object that is being connected to.
Note that the value of the connect directive is insert instead of update, which is what we used
above. The difference between the two is simple. Use "connect":"update" for properties that
have a unique value (and for the name property, which is unique on a per-language basis). Use
"connect":"insert" for properties, such as type, that can have more than one value. You are
also allowed to use "connect":"insert" with unique properties if there is not already a value
for that property.
The response object sets the value of the connect directive to inserted, telling us that the insertion
was successful. Our note named "A" is now also a /common/topic. If you visit your "My Freebase"
page, the note object should now be visible under the "Topics Created" heading. This is the main
reason to use the /common/topic type on the objects you create: it allows them to work well with
the freebase.com client.
Write Result
{ {
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"type":{ "type":{
"connect":"insert", "connect":"present",
"id":"/common/topic" "id":"/common/topic"
} }
} }
We're asking to insert /common/topic into a set of types that already includes /common/topic,
and we get the response present. It tells us that this value is already in the set and that nothing
has changed. (Non-unique properties in Metaweb are like sets: they do not allow duplicates.)
Let's do a quick read query to confirm that our object is a member of two types:
Read Result
{ {
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"type":[] "type":[
} "/user/docs/default_domain/note",
"/common/topic"
]
}
Write Result
{ {
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"type":{ "type":{
"connect":"delete", "connect":"deleted",
"id":"/common/topic" "id":"/common/topic"
} },
} }
This query looks just like the query we used to add the type, except that we've changed "insert"
to "delete". And Metaweb's response looks just like the response to the insertion, except that
"inserted" has changed to "deleted". You can verify that the object is no longer a /common/topic
by visiting "My Freebase" on sandbox.freebase.com and noting that it no longer appears in the
"Topics Created" list.
At this point, you probably have a pretty good idea what will happen if we re-run the query:
Write Result
{ {
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"type":{ "type":{
"connect":"delete", "connect":"absent",
"id":"/common/topic" "id":"/common/topic"
} }
} }
We asked Metaweb to remove /common/topic from a set that did not contain /common/topic,
so it returned "absent" to indicate that nothing has been changed.
The MQL write grammar has no syntax for deleting objects themselves. The closest thing to de-
leting an object is to delete all connections from that object to others. If an object has no type,
no name, and no other properties of interest, then it becomes effectively unreachable, and is almost
as good as gone. Note, however, that Metaweb maintains a modification history for each object.
When you view an object in the freebase.com client, you'll see a "History" link at the bottom of
each page. Clicking this link allows you to view the change history for the object, and allows
you to undo changes, including deletions.
When an object has had all its links deleted, it can still be queried by guid or creator (Metaweb
does not allow these read-only properties to be deleted.) In practice, however, unreachable objects
will only be found by determined searchers, and their continued existence is very unlikely to affect
Let's use this unlinking technique to "delete" the two note objects we've created:
Write Result
[{ [{
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"type":{ "type":{
"connect":"delete", "connect":"deleted",
"id":"/user/docs/default_domain/note" "id":"/user/docs/default_domain/note"
}, },
"name":{ "name":{
"connect":"delete", "connect":"deleted",
"value":"A", "value":"A",
"lang":"/lang/en" "lang":"/lang/en"
} }
},{ },{
"id":"#1f800000000003800f", "id":"#1f800000000003800f",
"type":{ "type":{
"connect":"delete", "connect":"deleted",
"id":"/user/docs/default_domain/note" "id":"/user/docs/default_domain/note"
}, },
"name":{ "name":{
"connect":"delete", "connect":"deleted",
"value":"B", "value":"B",
"lang":"/lang/en" "lang":"/lang/en"
} }
}] }]
Note that this write query is really two separate queries, included within square brackets. The
mqlwrite service (the topic of Chapter 6) accepts submissions of multiple writes at once. Note
that names are deleted with "connect":"delete", even though they are unique and were originally
created with "connect":"update". You must specify the lang property explicitly when deleting
a name.
As a final test, let's query the first of these objects (by id) and find out what little information it
still carries:
Read Result
{ {
"id":"#1f8000000000037ffc", "id":"#1f8000000000037ffc",
"*":null "guid":"#1f8000000000037ffc",
} "name":null,
"type":[],
"key":[],
"creator":"/user/docs",
"permission":"/boot/all_permission",
"timestamp":"2006-11-08T20:00:02.0000Z"
}
Example 6.7 in Chapter 6 is a command-line script for unlinking Metaweb objects in this way.
Its default behavior is to delete all objects you have created. You may find this script useful to
wipe your slate clean while experimenting with MQL writes.
When you submit multiple top-level write queries to Metaweb at the same time, it is natural
to ask whether they are executed in order, and whether a query can depend on an object
created by a previous query. The answer to both questions is no. The reason is a good one,
however: when multiple queries are submitted at the same time, they are executed atomically:
all are executed or none are executed.
In order to implement this atomic behavior, the Metaweb server first tests each query to
determine whether it will succeed. It does this without actually executing the query. If all
queries pass the test, then all are executed. Note, however, that this means that each query
must be able to succeed before any other queries have been run. Therefore, the queries
must be completely independent of each other. And since they are independent, there is
really no way to tell what order Metaweb executes them in.
Write Result
{ {
"create":"unless_exists", "create":"created",
"type":"/user/docs/default_domain/note", "type":"/user/docs/default_domain/note",
"name":"C#", "name":"C#",
"id":null "id":"#1f800000000104befe"
} }
Now contrast this with the query that "deletes" the object by unlinking its type and name:
Write Result
{ {
"id":"#1f800000000104befe", "id":"#1f800000000104befe",
"type":{ "type":{
"connect":"delete", "connect":"deleted",
"id":"/user/docs/default_domain/note" "id":"/user/docs/default_domain/note"
}, },
"name":{ "name":{
"connect":"delete", "connect":"deleted",
"value":"C#", "value":"C#",
"lang":"/lang/en" "lang":"/lang/en"
The creation query is much more compact because we are able to specify the type as a single id
and the name as a single string. In the deletion query, we must specify the expanded objects.
There are three factors that interact to make the creation query shorter. First, recall from Chapter 3
that every type has a default property. For value types such as /type/text (the type of the name
property) the default property is value. For core types in the /type domain, the default property
is id. For all other types, the default property is name. So in the creation query,
"type":"/user/docs/default_domain/note" is shorthand (but see the caution below!) for:
"type": { "id":"/user/docs/default_domain/note" }
The second factor that makes the creation query so compact is the fact that when you specify a
default property rather than a full object in a MQL write query, Metaweb assumes an implicit
"connect":"insert". So writing "type":"/user/docs/default_domain/note" is kind of (but
not exactly: see the caution that follows) like writing:
"type": {
"connect":"insert",
"id":"/user/docs/default_domain/note"
}
The third factor that makes the creation query compact is that the language of /type/text values
is automatically set to the default of English, or to your preferred language as specified by a
parameter to the mqlwrite service. (See Chapter 6 for details.)
All three factors come into play when we write "name":"C#". "C#" becomes the value of the
default property, which is the value. An implicit "connect":"insert" is added. And a lang
property is added to specify /lang/en, or whatever language we are using. So "name":"C#" ex-
pands to (but see the caution!):
"name": {
"connect":"insert",
"value":"C#",
"lang":"/lang/en"
}
From the explanation above, you might assume that the compact creation query with which
we began this section could be equivalently (but less compactly written) as:
{
"create":"unless_exists",
"id":null,
"name": "C#",
"type": {
"connect":"insert",
"id":"/user/docs/default_domain/note"
}
}
If the queries used "create":"unconditional" then they would be the same. But the
meaning of unless_exists is different for the two queries. The original compact query
could be translated as If you can find a Note object named "C#", return its id. Otherwise,
create a new Note object, name it "C#", and return its id.
But this variant that expands the type property is different in a subtle but important way.
It tells Metaweb: find or create an object named "C#", and then add Note to its set of types.
The difference between the two queries is critical if there is already an object (of type
/programming/language, perhaps) with the name "C#".
Here's another way to think about this. When the type is specified by id, this is a constraint
on the query. Metaweb must find an object that matches, or must construct one. When the
type is specified in a sub-query with an explicit connect directive the sub-query is not a
constraint, and does not affect the results of the unless_exists search.
• Visit your "My Freebase" page on sandbox.freebase.com and click on the Note type under
Types Created.
• Click on the Add a New Property button, enter the property name "next" into the text field
that appears, and click the Save button.
• Enter the type name "Note" into the Expected Type field (you may see a drop-down list
containing your version of the Note type and many other developer's versions. Select the
one that is followed by your username in parentheses.
The next property we've just added to our note type allows us to link one note to another in a
chain or a ring. We'll use this property to link each note to its perfect fifth--the note that is 7
semitones higher (usually, this is 5 white keys on a piano keyboard, which is probably why it is
called a fifth.) If we start with the note C, we find that it's fifth is the note G. Before we start using
the next property to represent fifths, however, let's run a simple query that will give us a convenient
shortcut:
Write Result
{ {
"id":"/user/docs/default_domain/note", "id":"/user/docs/default_domain/note",
"key":{ "key":{
"connect":"insert", "namespace":"/user/docs",
"namespace":"/user/docs", "connect":"inserted",
"value":"note" "value":"note"
} }
} }
This query specifies our Note type object by id, and then adds a new /type/key value to its key
property. What we've done is to make /user/docs/note a synonym for /user/docs/default_do-
Now, let's create Note objects to represent the notes C and G. Note that the following query is
two independent queries in an array:
Write Result
[{ [{
"create":"unless_exists", "create":"created",
"id":null, "id":"#1f80000000000384b0",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"C" "name":"C"
},{ },{
"create":"unless_exists", "create":"created",
"id":null, "id":"#1f80000000000384b4",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"G" "name":"G"
}] }]
We've asked Metaweb to create two Note objects, with names C and G, and to return their ids to
us. Now, let's insert the link that indicates that G is the fifth of C:
Write Result
{ {
"id":"#1f80000000000384b0", "id":"#1f80000000000384b0",
"/user/docs/note/next":{ "/user/docs/note/next":{
"connect":"update", "connect":"inserted",
"id":"#1f80000000000384b4" "id":"#1f80000000000384b4"
} }
} }
This compact query identifies both note objects by id and connects them with a connect directive.
Since we defined the next property to be unique, it uses "connect":"update" instead of "con-
nect":"insert". Note that since this query never specifies the type of the objects, we must use
a fully-qualified property name for the next property. You can verify that this query did what
we intended using the freebase.com client. Visit My Freebase on sandbox.freebase.com, and
click on the Note type. On the page for the Note type, you should see a list of instances of that
type. Click on the one named "C", and you'll see that it includes a hyperlink to the note G labeled
"Next".
The linking technique shown above is straightforward and easy to understand. It uses one query
to create (or look up) the two objects to be linked. Then it uses a second simple query to connect
the two objects. It is usually possible, however, to combine the creation and linking into a single
query. The following query, for example, sets the next property of the note G to a newly-created
note named D:
Notice that there is no connect directive here. Since the create directive is nested in this query,
the connection is implicit.
Write Result
{ {
"create":"unless_exists", "create":"created",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"B flat", "name":"B flat",
"next":{ "next":{
"create":"unless_exists", "create":"created",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"F", "name":"F",
"next":{ "next":{
"create":"unless_exists", "create":"connected",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"C" "name":"C"
} }
} }
} }
This query creates a note F and links it to the existing note C, and then creates a note B flat and
links it to the new note F. Note that the query uses "create":"unless_exists" three times. The
response includes "created" twice for the newly created notes. But for the note C, which already
exists, the response says "create":"connected". This tells us that the note C already existed,
but that a new connection has been made to it. If we rerun the query, we get "create":"existed"
all three times, since the objects and links already exist.
The following query is like the one above, but shorter, and with one important tweak:
Write Result
{ {
"create":"unless_exists", "create":"created",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"E flat", "name":"E flat",
This query creates a new note E flat, and connects it to B flat. Notice, however, that in the nested
clause of the query, we used a different form of the create directive: "create":"unless_connec-
ted". And in the response we have a "create":"created". If you examine the list of Note in-
stances in the freebase.com client, you'll see that there are now two of them named "B flat". If
you use unless_connected, then Metaweb looks for a matching object that is already connected.
If it cannot find one, it creates a new one and connects it. In this case, there was an existing Note
object named B flat, but it was not already connected, so the query created a new one. If we re-
run the query, however, it simply returns "create":"existed" because the object and the con-
nection exist.
Note that unless_connected only makes sense in nested clauses. If we change the outermost
unless_exists in the query above to unless_connected, Metaweb complains: Can't use 'create':
'unless_connected' at the root of the query.
Write Result
{ {
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"E flat", "name":"E flat",
"next":{ "next":{
"connect":"delete", "connect":"deleted",
"type":{ "type":{
"connect":"delete", "connect":"deleted",
"id":"/user/docs/note" "id":"/user/docs/note"
}, },
"name":{ "name":{
"connect":"delete", "connect":"deleted",
"value":"B flat", "value":"B flat",
Note that the query above does two things. It disconnects the name and type of the extra B flat
object, and also disconnects that object from E flat. Now all we have to do is connect E flat to
the valid B flat object. This should be easy for you now:
Write Result
{ {
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"E flat", "name":"E flat",
"next":{ "next":{
"connect":"insert", "type":"/user/docs/note",
"type":"/user/docs/note", "connect":"inserted",
"name":"B flat" "name":"B flat"
} }
} }
"create":"unless_exists" Look for the object in the database and create a new one
if a match cannot be found.
"create":"created" Indicates that a new object has been created. This is always the
response for unconditional directives, but may also be returned
by unless_exists and unless_connected directives.
"create":"connected" Indicates that the object already existed but a connection has
been made. This response is only possible for unless_exists
directives that are nested within a parent query.
"connect":"update" Use this form to attach a value or object to a unique property, repla-
cing any value or object that was previously connected.
"connect":"delete" Use this form to detach a value or object from a property. It works
for unique and non-unique properties.
"connect":"absent" Indicates that a delete directive was not successful because the
connection to be deleted did not exist.
If you appreciated not having to type default_domain/ in the examples above, you can use the
same shortcut for the new Chord type:
Write Result
{ {
"create":"unless_exists", "create":"created",
"name":"CEG", "name":"CEG",
"type":[ "type":[
"/common/topic", "/common/topic",
"/user/docs/chord" "/user/docs/chord"
], ],
"note":[{ "note":[{
"connect":"insert", "connect":"inserted",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"C" "name":"C"
},{ },{
"connect":"insert", "connect":"inserted",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"G" "name":"G"
},{ },{
"create":"unless_exists", "create":"created",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"E" "name":"E"
}] }]
} }
• It specifies the ids of two types within a JSON array. The created object will be both a Chord
and a Topic. (We'll say more about arrays in write queries and about the /common/topic type
below).
• It specifies three notes, as expanded objects, within a JSON array. These are the set of values
for the note property of the chord.
• Note objects C and G exist already, so this query uses "connect":"insert" for these two. We
haven't created an object to represent E yet, so the query creates and connects it with "cre-
ate":"unless_exists".
So far in this chapter, we've only seen square brackets in write queries when we were bundling
up multiple top-level queries to be submitted to Metaweb in a single batch. The MQL write
grammar is actually more general than this: nested queries can also be collected into an array,
When we specify more than one type for an object, we use a JSON array. But the Metaweb
object model represents the types as an unordered set, so the order in which we specify
them should not matter. In fact, however, it does. The last type in the array of types is used
to qualify any unqualified property names that are not /type/object properties.
In the query above, if we had specified /user/docs/chord first, and /common/topic second,
then Metaweb would have assumed that the unqualified note property meant /common/top-
ic/note, and this would have caused an error since there is no such property. If you don't
want to rely on the order of the types, you can just be explicit and use the fully-qualified
names of all properties, such as /user/docs/chord/note.
Fortunately, the freebase.com client makes it very easy to define such a property:
• Go to your My Freebase page on sandbox.freebase.com home page, and click on your Note
type under User's Types.
• Click on the "View Schema" link on the page for your Note type.
• Look near the bottom of the schema page (you may have to scroll down) for the heading
Suggested Properties. You should see something like what is shown in Figure 5.3
• This tells you is that the type Chord has a property named Note. 1 The client is suggesting that
you add a reciprocal property to expose the other direction of the link. The link is already there:
all that is required is that you give this property a name so that you can refer to it.
• Since the Chord property that refers to Notes is named note, it seems sensible to name the
Note property that refers to Chords chord. Click the Edit button or double-click the double-
click to edit text message. Then type in "chord" and hit Enter or click Save.
1
The freebase.com client capitalizes type and property names: these are the "human-readable" forms: their ids are still lowercase
/user/docs/chord and /user/docs/chord/note).
You have now created the property /user/docs/note/chord, which is the reciprocal property
of /user/docs/chord/note. Since we now have a pair of properties, we can take advantage of
the bi-directional nature of the links between chords and notes.
Let's experiment with this. First, we'll query the Chord CEG to find out what notes it contains:
Read Result
{ {
"type":"/user/docs/chord", "type":"/user/docs/chord",
"name":"CEG", "name":"CEG",
"note":[] "note":["C","G","E"]
} }
This result is unsurprising, given that the /user/docs/chord/note property is the one we defined
originally. Now let's turn the query around and try out the reciprocal /user/docs/note/chord
property we've just added. What chords is the note C a part of?
Read Result
{ {
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"C", "name":"C",
"chord":[] "chord":["CEG"]
} }
The note C "knows" that it is part of the chord CEG even though we never set its chord property.
Setting a property automatically causes its reciprocal property to be set as well. Because links
are bi-directional in Metaweb, this is all automatic.
Write Result
{ {
"create":"unless_exists", "create":"created",
"type":["/common/topic", "type":["/common/topic",
"/user/docs/chord"], "/user/docs/chord"],
"name":"BFG" "name":"BFG"
} }
Write Result
[{ [{
"create":"unless_exists", "create":"created",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"B", "name":"B",
"chord": { "chord":{
"connect":"insert", "connect":"inserted",
"type":"/user/docs/chord", "type":"/user/docs/chord",
"name":"BFG" "name":"BFG"
} }
},{ },{
"create":"unless_exists", "create":"existed",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"F", "name":"F",
"chord": { "chord":{
"connect":"insert", "connect":"inserted",
"type":"/user/docs/chord", "type":"/user/docs/chord",
"name":"BFG" "name":"BFG"
} }
},{ },{
"create":"unless_exists", "create":"existed",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"G", "name":"G",
"chord": { "chord":{
"connect":"insert", "connect":"inserted",
"type":"/user/docs/chord", "type":"/user/docs/chord",
"name":"BFG" "name":"BFG"
} }
}] }]
This query connects the BFG chord to the chord property of the notes B, F, and G. (It also creates
the note B, which didn't exist yet.) Now let's ask BFG what notes it contains:
Read Result
{ {
"type":"/user/docs/chord", "type":"/user/docs/chord",
"name":"BFG", "name":"BFG",
"note":[] "note":["B","F","G"]
} }
Once again, we've demonstrated that we can set a property of an object by setting the reciprocal
property to refer to that object.
Read Result
{ {
"type":"/type/property", "type":"/type/property",
"id":"/user/docs/chord/note", "id":"/user/docs/chord/note",
"expected_type":null, "expected_type":"/user/docs/note",
"master_property":null, "master_property":null,
"reverse_property":null "reverse_property":
} "/user/docs/default_domain/note/chord"
}
We find that the note property of Chord has an expected_type of Note, as expected. More inter-
estingly, though, we find a property named reverse_property that refers to the chord property
of the Note type. 2 So let's query that property now:
Read Result
{ {
"type":"/type/property", "type":"/type/property",
"id":"/user/docs/note/chord", "id":"/user/docs/note/chord",
"expected_type":null, "expected_type":"/user/docs/chord",
"master_property":null, "master_property":
"reverse_property":null "/user/docs/default_domain/chord/note",
} "reverse_property":null
}
This property has a reverse_property of null, but has a property named master_property that
refers back to the first property we looked at.
Reciprocal properties are linked to each other via the master_property and reverse_property
properties. When one property is set, its reciprocal, if it has one, is automatically set. The recipro-
city is symmetrical: the terms "master" and "reverse" imply a directionality or hierarchy, but the
property labeled "master" has no special status or preference over the property labeled "reverse".
(When types are created with the freebase.com clients, the property created first is the master
property.) 3
2
Note that default_domain has crept back into the result. We've created shortcuts for our types in easier-to-type namespaces, but
their original location was under /user/docs/default_domain.
3
The master_property and reverse_property properties are themselves reciprocal properties. (Verifying this with a MQL
query is left as an exercise for the reader.) This means that if you set p.reverse_property to q then q.master_property is
automatically set to p.
In order to demonstrate how to create an ordered collection, we'll need a suitable type. Chords
don't work: the notes of a chord are played simultaneously, and no order is required. A broken
chord (or arpeggio) is a chord in which the notes are played sequentially. Since there is a sequence,
there is an order. Use the freebase.com client to define a new type named "Arpeggio" in the
sandbox. Give it a property named "note" whose expected type is Note. Arpeggio is actually just
like Chord: only the names are different. To save yourself typing, copy the type from
/user/docs/default_domain/arpeggio (use your own username) to /user/docs/arpeggio, just
as we did for the Note and Chord types:
{
"id":"/user/docs/default_domain/arpeggio",
"key":{
"connect":"insert",
"namespace":"/user/docs",
"value":"arpeggio"
}
}
Now with our type defined, let's create our first ordered collection:
Write Result
{ {
"create":"unless_exists", "create":"created",
"type":"/user/docs/arpeggio", "type":"/user/docs/arpeggio",
"name":"broken CEG", "name":"broken CEG",
"note": [{ "note":[{
"index":0, "index":0,
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"C" "name":"C"
},{ },{
"index":1, "index":1,
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"E" "name":"E"
},{ },{
"index":2, "index":2,
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"G" "name":"G"
}] }]
} }
Creating an arpeggio is much like creating a chord. Two things stand out about this query, however.
First, each note has an index associated with it, and there are no connect directives. The index
There are some strict rules that govern the use of the index property:
• The index property may not appear within a top-level query. Indexes don't apply to objects
but to the links between objects. The index property is used in sub-queries to specify the order
of the links between the parent object and the children.
• If there are n sibling sub-queries that specify an index, the values specified must include every
integer from 0 to n-1. You must always start with zero. You may not include duplicate indexes,
and you may not skip an index. It is not required that every element of a sub-query array have
an index. Metaweb collections can be partially ordered and partially unordered.
This second rule may seem surprisingly strict, but remember that despite the name "index", the
values we specify with the index property are not array indexes. The numbers are merely a simple
way to specify a series of less than and greater than relationships. The requirement that indexes
always run from 0 through n-1 means that there is really no way to insert an element at a given
location with an ordered collection, and no way to move an element from one spot to another.
Suppose, for example, that we want to insert the notes B and F into our CEG arpeggio at the be-
ginning so the arpeggio consists of the five sequential notes BFCEG:
Write Result
{ {
"type":"/user/docs/arpeggio", "type":"/user/docs/arpeggio",
"name":"broken CEG", "name":"broken CEG",
"note": [{ "note":[{
"create":"unless_exists", "create":"connected",
"index":0, "index":0,
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"B" "name":"B"
},{ },{
"create":"unless_exists", "create":"connected",
"index":1, "index":1,
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"F" "name":"F"
}] }]
} }
This query demonstrates that the index property can be used along with a create directive. The
notes already exist, but are not connected so we get "create":"connected" in the response.
What has this query actually done? We can use a read query to ask for the notes of the arpeggio
in order and see if we've accomplished what we wanted to. But before we do, let's consider what
information we've given Metaweb. Previously, we told Metaweb that C comes before E and E
comes before G. Now, we've also told Metaweb that B comes before F. But this isn't enough.
Read Result
{ {
"type":"/user/docs/arpeggio", "type":"/user/docs/arpeggio",
"name":"broken CEG", "name":"broken CEG",
"note":[{ "note":[
"index":null, {"index":0, "name":"B"},
"name":null, {"index":1, "name":"F"},
"sort":"index" {"index":2, "name":"C"},
}] {"index":3, "name":"E"},
} {"index":4, "name":"G"}
]
}
Metaweb actually returns the notes in the order we wanted! The fact that Metaweb inserts new
elements at the beginning of an ordered collection is an implementation detail, however, and is
not behavior that is guaranteed. In practice, if you want to define particular ordering for the ele-
ments of a collection, you must write a single query that enumerates each of those elements and
gives them all an index. Any subsequent insertions or shuffles require you to submit a new query
that again lists all elements and defines their order. 4
{
"type":"/user/docs/arpeggio",
"name":"broken CEG",
"note": [{
"create":"unless_exists",
"index":0,
"type":"/user/docs/note",
"name":"B"
},{
"create":"unless_exists",
"index":1,
"type":"/user/docs/note",
"name":"F"
},
{ "index":2, "type":"/user/docs/note", "name":"C" },
{ "index":3, "type":"/user/docs/note", "name":"E" },
4
What if, when we'd inserted B and F, we'd also included C, with index 2 in the query. Then Metaweb would know that B is less than
F and F is less than C. And it already knows that C is less than E and E is less than G. Wouldn't that define a complete ordering? This
sounds plausible, but, in fact, after such a query Metaweb would have an ordering for the subset BFC and another ordering for the
subset EG, and it still would not know the relationship between those two subsets.
This query inserts the two new notes and re-iterates the order of the three existing ones, to specify
a complete ordering of five notes with indexes 0 through 4.
Metaweb's ordered collections are not arrays and do not behave like arrays. The requirement that
you re-specify the complete ordering even for simple insertions demonstrates this. And it also
makes it clear that ordering is only practical for relatively small and static collections. It makes
good sense to define an ordering for the tracks on an album, for example. It makes less sense to
define an ordering for the albums recorded by a band, however, because this set of albums may
change with time and a database maintainer would be required to respecify the complete disco-
graphy each time a new album was added. And it makes no sense at all to try to specify an ordering
for bands (by specifying an index for each instance property of the /music/artist type): there
are simply too many of them.
If you are a musician, you probably know that broken chords often repeat a note. In practice,
we'd want to represent arpeggios like EGCE, where the note E appears twice. In Metaweb,
the value of a property is a set, and sets do not allow duplicates, even when they are ordered.
That is, ordered collections in Metaweb are still sets, not lists, and they do not allow duplic-
ates. In order to represent an arpeggio EGCE, therefore, we'd have to create two separate
note objects named E. But having two objects that both represent the note E is problematic:
the next and chord properties of the Note type are premised on the assumption that there
will only be one Note instance for each note.
5.1.11. Namespaces
In several places throughout this tutorial we've placed types into new namespaces simply to make
our queries a little easier to enter into the query editor. In this section we'll explore namespaces
in more detail.
We begin with a review of material from Chapter 2. First, remember that fully-qualified names
and namespaces don't have anything to do with the name property of an object. The name property
defines a human-readable display name for an object. Fully-qualified names are unique and can
be used as an alternative to the object guid.
Fully-qualified names are defined by the value type /type/key. Every object has a key property
that holds a set of /type/key values. If you want an object to have a fully-qualified name, insert
a key into its key property. The value property of the key specifies the object's unqualified or
local name. And the namespace property of the key specifies the object that defines the namespace.
The type /type/namespace exists, and defines the the property /type/namespace/keys, which
is the reciprocal of /type/key/namespace. Objects that are used as namespaces are usually given
the type /type/namespace, but this is not required.
The reason that namespaces are useful is that namespaces allow us to use fully-qualified names
to uniquely identify objects. If an object is given a key, then we can use its unique fully-qualified
name as the value of the id property. Identifying objects with a human-readable id is simpler
than using a long guid, and is more reliable than using the name and type properties together.
With that review of namespaces, let's try to put some of the note objects we've created into a
namespace. We'll use the /user/docs/note type object as our namespace:
Write Result
[{ [{
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"C", "name":"C",
"key":{ "key":{
"connect":"insert", "connect":"inserted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"C" "value":"C"
} }
},{ },{
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"E", "name":"E",
"key":{ "key":{
"connect":"insert", "connect":"inserted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"E" "value":"E"
} }
},{ },{
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"G", "name":"G",
"key":{ "key":{
"connect":"insert", "connect":"inserted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"G" "value":"G"
} }
}] }]
This query gives the notes C, E, and G keys named "C", "E", and "G" within the namespace
/user/docs/note. That is, it defines fully-qualified names for these notes /user/docs/note/C,
/user/docs/note/E, and /user/docs/note/G. Now that these notes have unique ids, it becomes
(somewhat) easier to use them in queries. Here's how we might create a chord:
This query replaces the name and type properties of each note with a single id property. It doesn't
actually do anything, since we have already created the CEG chord. We've seen that we can use
a note's fully-qualified name as the value of its id property. What if we query the id of a note?
Read Result
{ {
"type":"/user/docs/chord", "type":"/user/docs/chord",
"name":"CEG", "name":"CEG",
"note":[{"id":null}] "note":[{"id":"#1f800000000104bd02"},
} {"id":"#1f800000000104beae"},
{"id":"#1f800000000104beec"}]
}
We get the guids of the notes rather than the fully-qualified names we've just defined. Core types,
such as /type/type and /type/property, that use id as their default property return a fully-
qualified name instead of their guid in queries like this.
Read Result
{ {
"id":"/user/docs/note", "id":"/user/docs/note",
"/type/namespace/keys":[] "/type/namespace/keys":[
} "next","chord","arpeggio","C","E","G"
]
}
Read Result
{ {
"id":"/user/docs/note", "id":"/user/docs/note",
"/type/namespace/keys":[{}] "/type/namespace/keys":[{
} "type":"/type/key",
"namespace":
"/user/docs/default_domain/note/next",
"value":"next"
},{
"type":"/type/key",
"namespace":
"/user/docs/default_domain/note/chord",
"value":"chord"
},{
"type":"/type/key",
"namespace":
"/user/docs/default_domain/note/arpeggio",
"value":"arpeggio"
},{
"type":"/type/key",
"namespace":
"/user/docs/default_domain/note/C",
"value":"C"
},{
"type":"/type/key",
"namespace":
"/user/docs/default_domain/note/E",
"value":"E"
},{
"type":"/type/key",
"namespace":
"/user/docs/default_domain/note/G",
"value":"G"
}]
}
The values of the /type/namespace/keys property are /type/key values that have value and
namespace properties. You'll notice that default_domain has crept back into the object ids in the
query results. This is interesting, but not terribly important. We'll investigate it later in this section.
There is one very important point to notice about these query results. When a key value is used
with /type/object/key, the namespace property is the id of the namespace object (such as
/user/docs/note) that holds the key. But when a key value is used with /type/namespace/keys,
the namespace property is the id of the object (such as /user/docs/note/C) contained by the
namespace. This is important to understand, so we'll state it another way: suppose that an object
o has a fully-qualified name in the namespace n. If we query the key property of o, we'll find a
/type/key object whose namespace property refers to n. And if we query the
If you wanted to create a Metaweb namespace browser application, you could repeat the query
above, starting with the id of the root namespace "/". The namespace properties of each of the
returned keys specify the ids of all objects in the root namespace. If you recursively query each
of these ids, you'll find the complete set of Metaweb objects with fully-qualified names.
It is also possible to add objects to namespaces using the /type/namespace/keys property instead
of /type/object/key. The following query creates a new Note object named "G flat" and assigns
it the fully-qualified name /user/docs/note/G_flat:
Write Result
{ {
"id":"/user/docs/note", "id":"/user/docs/note",
"/type/namespace/keys":{ "/type/namespace/keys":{
"connect":"insert", "connect":"connected",
"value":"G_flat" "value":"G_flat",
"namespace":{ "namespace":{
"create":"unless_exists", "create":"created",
"name":"G flat", "name":"G flat",
"type":"/user/docs/note" "type":"/user/docs/note"
} }
} }
} }
Read Result
{ {
"id":"/user/docs/default_domain/note/G", "id":"/user/docs/default_domain/note/G",
"/user/docs/note/chord":[] "/user/docs/note/chord":["CEG","BFG"]
} }
So a single object can have more than one fully-qualified name. But can a fully-qualified name
refer to more than one object? Let's try to give the note F the same key that we assigned to G:
{
"type":"/user/docs/note",
"name":"F",
"key":{
"connect":"insert",
This query fails, although there is nothing obviously wrong with it. Metaweb simply will not allow
the fully-qualified name /user/docs/note/G to refer to two different note objects. If you want
to make /user/docs/note/G refer to the note F, you must first make sure that the note does not
refer to the note G. This takes two queries. First, we must remove the fully-qualified name for
the note G:
Write Result
{ {
"id":"/user/docs/note/G", "id":"/user/docs/note/G",
"key":{ "key":{
"connect":"delete", "connect":"deleted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"G" "value":"G"
} }
} }
Write Result
{ {
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"F", "name":"F",
"key":{ "key":{
"connect":"insert", "connect":"inserted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"G" "value":"G"
} }
} }
Now if we were to ask for the name of the note /user/docs/note/G, we'd get "F". Making a
fully-qualified name refer to another object is simpler if we use the /type/namespace/keys
property instead. Here's how we could make /user/docs/note/G refer to the note G again:
Write Result
{ {
"id":"/user/docs/note", "id":"/user/docs/note",
"/type/namespace/keys": { "/type/namespace/keys":{
"value":"G", "value":"G",
"namespace":{ "namespace":{
"connect":"update", "connect":"updated",
"type":"/user/docs/note", "type":"/user/docs/note",
"name":"G" "name":"G"
} }
This query locates the /type/key object that defines the name /user/docs/note/G, and updates
the namespace property of that key, so that the name points to a different object. Note that you
should not typically have to alter namespaces like this. Objects that have fully-qualified names
should typically be constants.
Finally, notice that changing the object to which a fully-qualified name refers (as we did above)
is a completely different operation than changing the fully-qualified name of an object. If we
wanted to refer to the note G by the name /user/docs/note/Gnatural instead of
/user/docs/note/G, we could do this:
Write Result
{ {
"name":"G", "name":"G",
"type":"/user/docs/note", "type":"/user/docs/note",
"key":[{ "key":[{
"connect":"delete", "connect":"deleted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"G" "value":"G"
},{ },{
"connect":"insert", "connect":"inserted",
"namespace":"/user/docs/note", "namespace":"/user/docs/note",
"value":"Gnatural" "value":"Gnatural"
}] }]
} }
On the other hand, types, properties, and domains are Metaweb objects just like any others, and
they can be created and manipulated with MQL write queries. There is some educational value
in seeing how this is done, and there are a few things that we can do with MQL that we cannot
do through the client.
Write Result
{ {
"id":"/user/docs/note", "id":"/user/docs/note",
"/type/type/properties":{ "/type/type/properties":{
"create":"unless_exists", "create":"created",
"type":"/type/property", "type":"/type/property",
"name":"Previous", "name":"Previous",
"key":{ "key":{
"connect":"insert", "namespace":"/user/docs/note",
"namespace":"/user/docs/note", "connect":"inserted",
"value":"previous" "value":"previous"
}, },
"expected_type": "expected_type":
"/user/docs/Note", "/user/docs/Note",
"unique":true, "unique":true,
"master_property": "master_property":
"/user/docs/note/next" "/user/docs/note/next"
} }
} }
The first line of the query identifies the type to which we're adding the property. The second,
third and fourth lines specify that we're creating a new /type/property object and connecting
it to the properties property of our type. The 5th line gives the property the human-readable
name "Previous" and the following four lines define a key so that the property has the fully-
qualified name /user/docs/note/previous. The expected_type property specifies that the
newly created property should link to other Note objects. The unique property specifies that each
Note can have only a single value for the previous property. And, finally, the master_property
property specifies that this new property is the reciprocal of /user/docs/note/next.
After executing this query, you can test it by querying the previous property of the note G. You
can also use the freebase.com client to browse the note objects you've created and follow their
next and previous properties back and forth.
Domain objects have two properties: types specifies the set of types that are part of the domain.
owners specifies one or more usergroups that own the domain and have permission to create
types in it. Creating a domain is not enough: we must also set its ownership. We'll give this new
domain the same owner as /user/docs/default_domain. So before we create the domain, let's
find out the owner of the existing one:
In addition to querying the owner of the domain, we also queried its types. Notice that it is not
co-typed with /type/namespace even though it is used as a namespace.
Now, we can create the new domain. Remember to use your own username in place of "docs",
and to substitute the id you obtained in the query above for the one shown here.
Write Result
{ {
"create":"unconditional", "create":"created",
"type":"/type/domain", "type":"/type/domain",
"owners":"#1f80000000010499ee", "owners":"#1f80000000010499ee",
"key":{ "key":{
"connect":"insert", "connect":"inserted",
"namespace":"/user/docs", "namespace":"/user/docs",
"value":"music" "value":"music"
} }
} }
The only thing we can do with a domain is add types to it. That is the subject of the next section.
Write Result
[{ [{
"create":"unless_exists", "create":"created",
"type":"/type/type", "type":"/type/type",
"name":"Note", "name":"Note",
"key":{ "key":{
Note that these queries specify a fully-qualified name for the newly created type objects, but also
specify their domain. /user/docs/music serves as both the namespace and the domain for the
new types, and it is important that both are specified.
The next step is to add properties. To keep this example simple, we will add a note property to
the chord type, and then (in a separate query) add the reciprocal property to the note type. Here's
how we define the note property:
Write Result
{ {
"id":"/user/docs/music/chord", "id":"/user/docs/music/chord"
"/type/type/properties":{ "/type/type/properties":{
"create":"unless_connected", "create":"created",
"type":"/type/property", "type":"/type/property",
"name":"Notes", "name":"Notes",
"expected_type": "expected_type":
"/user/docs/music/note", "/user/docs/music/note",
"key":{ "key":{
"connect":"insert", "connect":"inserted",
"namespace": "namespace":
"/user/docs/music/chord", "/user/docs/music/chord",
"value":"note" "value":"note"
} }
} }
} }
With these types and properties defined, we can now create instances:
Write Result
{ {
"create":"unless_exists", "create":"created",
"type":"/user/docs/music/chord", "type":"/user/docs/music/chord",
"name":"CG", "name":"CG",
"note":[{ "note":[{
"create":"unless_exists", "create":"created",
"type":"/user/docs/music/note", "type":"/user/docs/music/note",
"name":"C" "name":"C"
},{ },{
"create":"unless_exists", "create":"created",
"type":"/user/docs/music/note", "type":"/user/docs/music/note",
"name":"G" "name":"G"
}] }]
} }
pair:: property |
creation |
connection |
index-specification |
id-query
A creation directive is the keyword "create" in quotation marks followed by a colon and one of
the quoted strings "unconditional", "unless_exists" or "unless_connected".
Finally, an id-query is simply the identifier "id" in quotation marks, followed by a colon and
the keyword null, without quotation marks. Alternatively, "/type/object/id" can be used in place
of "id"
The examples in this chapter are all written in Python. They all communicate with the Metaweb
services at sandbox.freebase.com. And they are all designed to be run as command-line utilities.
Note that this differs from Chapter 4 which placed more emphasis on server-side code for web
applications. Many Metaweb-enabled web applications will be read-only, but the Python code
shown in this chapter should be straightforward to port for use in server-side scripts if you need
it.
Since Metaweb services are HTTP-based, Metaweb authentication is cookie based. You log in
by making an HTTP POST request to the URL http://www.freebase.com/api/account/login,
passing your username and password as URL-encoded form parameters. If login is successful,
Metaweb returns one or more HTTP cookies in the response headers. These cookies contain your
authentication credentials, and you must pass these back to Metaweb in the HTTP request headers
of all subsequent write and upload requests.
The names and values of the authentication cookies are an implementation detail rather than a
specification detail, and are subject to change. To ensure success, your code must accept all
cookies returned by the login service, and must present all of them to the mqlwrite service. If you
write your applications using a suitably high-level HTTP library, cookie handling may be per-
formed automatically for you. For this chapter, however, we explicitly handle the cookies at a
lower level.
The cookies returned by the login service are persistent, which means that you do not have to
log into the freebase.com client each time you visit the site. Nevertheless, when writing scripts
that use Metaweb services, the best practice is to assume that your cookies have expired and log
in each time the script is invoked.
Example 6.1 shows Python code that defines a metaweb.login() utility function. This function
sends a username and password to the login service, and returns cookies that can be treated as
opaque authentication credentials. If login fails, the function raises a metaweb.MQLError exception.
This login utility function is part of a larger metaweb.py module. Other examples in this chapter
are also part of the module. We first saw metaweb.py in Chapter 4, where Example 4.3 defined
a metaweb.read() utility function. Example 4.3 also includes the definition of the MQLError ex-
ception class used in this example. Example 6.1 (and all our other Python examples) also depends
on the simplejson module, which is available from http://cheeseshop.python.org/pypi/simplejson.
133
Example 6.1. metaweb.py: logging in to Metaweb with Python
# Submit the specified username and password to the Metaweb login service.
# Return opaque authentication credentials on success.
# Raise MQLError on failure.
def login(username, password):
# Establish a connection to the server and make a request.
# Note that we use the low-level httplib library instead of urllib2.
# This allows us to manage cookies explicitly.
conn = httplib.HTTPConnection(host)
conn.request('POST', # POST the request
loginservice, # The URL path /api/account/login
# The body of the request: encoded username/password
urllib.urlencode({'username':username, 'password':password}),
# This header specifies how the body of the post is encoded.
{'Content-type': 'application/x-www-form-urlencoded'})
If the login fails, then the status property of the JSON object will be something other then "200
OK". In this case, the messages property is an array (typically with just one element) of message
objects. The text property of the first element of this array includes an error message that describes
what went wrong. (Typically, this is "Invalid username or password".)
X-Metaweb-Request
The Metaweb mqlwrite service (and also the upload service documented later in this chapter)
require a custom HTTP request header, named X-Metaweb-Request to be present in all re-
quests. The value of the header can be anything.
The requirement that the custom header be present is a security measure to prevent XSS
(cross-site scripting) attacks. It has a profound implication for the services that require it:
these services cannot be invoked via HTML form submission, since there is no way to tell
a web browser to add a custom header like this when POSTing a form.
The query and response envelopes are described in the sub-sections that follow, and those explan-
ations are followed by example code that performs writes.
{
"create":"unless_exists",
{
"query":{
"create":"unless_exists",
"type":"/common/topic",
"name":"my test object"
}
}
The name "query" is arbitrary, but it will be reused in the response envelope.
result The value of this property is an object that holds the results of the submitted
queries. If the query envelope used the property name query for the query, then
the response envelope will make the results of the query available as result.query.
If a query fails, this property will be set to null.
status For successful queries, this property will have the value "200 OK".
queries The value of this property is a copy of the query envelope that was submitted.
messages This property is an array of error messages. For successful queries it is empty. For
unsuccessful queries, it contains one or more message objects. Each message object
has the following properties:
args An object that provides additional details about the error. For example,
if a query uses "create":"unless_exists" and it cannot complete be-
cause two matching objects exist, then the text property might be "Need
a unique result to attach here, not 2", and the args object will specify
the guids of the two matching objects.
query A copy of the query object (not the query envelope, but the query itself)
that contains the error, with the addition of a special error_inside
property, to indicate where error occurs.
level A string that indicates the severity of the error. A typical value is "error".
The code in Example 6.2 uses the metaweb.MQLError exception class. That class was defined in
Example 4.3.
Next, create a Metaweb type to model this data. Login to the freebase.com client and create a
new type named "US State Quarter". (Review the procedure for creating types and adding prop-
erties in Chapter 5, if necessary.) If your Freebase username is "fred", this will create a type with
id /user/fred/default_domain/us_state_quarter.
• A property named "State", of /type/text to specify the name of the state. (The freebase.com
client specifies types by name rather than by id, so use the type name "Text" in the client).
• A property named "Release", of /type/datetime, to specify the date on which the quarter was
released into circulation. (Use the type name "Date/Time").
• A property named "Mintage", of /type/int, to specify how many quarters were minted. (Use
the type name "Integer".)
• A property named "Statehood", of /type/datetime to specify when the state gained statehood.
Be sure to make each of these properties unique by clicking the "Restrict to one value" checkbox.
Next, you need to get your data into manageable form. Extract data from the US Mint site, and
arrange it in a plain text file named quarters.txt that looks like the following:
Delaware,1999-01-04,1787-12-07,774824000
Pennsylvania,1999-03-08,1787-12-12,707332000
New Jersey,1999-05-17,1787-12-18,662228000
Georgia,1999-07-19,1788-01-02,939932000
1
Specifically the page http://www.usmint.gov/mint_programs/50sq_program/index.cfm?action=schedule
Each line in this file is the data for a single quarter. Fields are separated by commas. The first
field is the name of the state. The second and third fields are the release date and statehood date
for that state. And the fourth field is the mintage for that state quarter.
With our type created, and the data in this format, we can now write a simple script to upload
the data to freebase.com. Example 6.3 shows Python code to do this. Note that you need to insert
your own Freebase username and password into the script to make it work for you.
When you include multiple writes in a single query, the writes are executed atomically: they all
succeed or they all fail. As a result, they are not allowed to depend on each other, and there is no
way to tell what order they are executed in.
If you submit multiple queries in a single envelope, they are not atomic. Each one succeeds or
fails on its own. JSON envelopes are unordered collections of properties, and there is no guarantee
that the queries will be executed in the order in which they are written. For this reason, queries
that share an envelope should not depend on each other.
The fact that mqlwrite can accept multiple queries explains why an envelope object is required
in the first place. It is needed so that each query has a name that can be used for query results. If
multiple named queries are submitted in an envelope, then the response envelope contains multiples
result. If the queries are named q1 and q2, then the responses are available as result.q1 and
result.q2.
No Duplicates
The upload service does not always create a new /type/content object for the content you
upload. All content is checksummed when it is uploaded, and these checksums are used to
detect duplicate uploads. If the content you are uploading already exists in the Metaweb
content store, the existing /type/content object is used. The blob_id property of
/type/content holds the checksum value.
In order to understand this code, you have to know that the upload service handles images spe-
cially. When you upload images, the /type/content object is also given the type /common/image.
Furthermore, the upload service determines the size of the image and creates an appropriate
/measurement_unit/rectangle_size object for the /common/image/size property. Because
your uploaded /type/content is co-typed as /common/image, you can link directly to image
content from /common/topic/image.
# Define a write query to link the quarter object to the uploaded image
query = { 'type': TYPEID,
'state':state,
'/common/topic/image': { 'id':id, 'connect':'insert' }}
Once you have run the code in Example 6.5, use the freebase.com client to view your state quarter
topics. You'll see that there are now images on the page.
This Python program expects the name of an HTML file as a command-line argument. It reads
the file and determines the document title by searching for a <title> tag. It uploads the document
text with the metaweb.upload() method of Example 6.4. Then it submits a MQL write query to
create a /common/topic that refers to a /common/document that refers to the uploaded content. It
uses the document title as the name of /common/topic object.
Example 6.7 performs a number of read queries as well as writes. To do this, it uses the
metaweb.read() utility defined in Example 4.3 of Chapter 4.
Example 6.7 is a relatively complex example. It demonstrates the mqlwrite service, of course,
but also demonstrates low-level reads and writes of types and properties. If you can make sense
of this example, you have a solid understanding of the Metaweb architecture.
return typecache[t]
#
# Return a MQL write query that will unlink the object with the specified id
#
def makeUnlinkQuery(id):
# Find all types of the object
types = metaweb.read({ 'id':id, 'type':[] })['type']
# The query is structured in such a way that the result is almost ready
# to be reused as a write query. We loop through the properties of the
# result, and for any that are non-empty, we copy them to a query object
# adding "connect":"delete" to transform it into a MQL write query
q = {}
for p,v in r.iteritems():
if p == 'id': # leave the id of the query alone
q[p] = v
continue
# if the property is not id, then the value is an array
# if the array is empty, then skip this property; don't copy to query
if len(v) == 0: continue
# otherwise, iterate through the elements of the array
# and add the connect:delete directive to each one
for elt in v: elt['connect'] = 'delete'
# and copy to the query
q[p] = v
# Return the MQL write query that we can use to unlink the object
return q
def main():
# Use the getopt module to parse the command line arguments
try:
opts, args = getopt.getopt(sys.argv[1:],
"u:p:n:t:",
["user=","password=","name=","type=","debug"])
except getopt.error, msg:
print msg
sys.exit(0)
# If we're executing this file from the command-line, run the main() method
if __name__ == "__main__": main()
A.1. json.js
Example A.1 is the json.js module that defines the JSON.parse() and JSON.serialize()
functions used in the JavaScript-based examples of Chapter 4.
/**
* json.js:
* This file defines functions JSON.parse() and JSON.serialize()
* for decoding and encoding JavaScript objects and arrays from and to
* application/json format.
*
* The JSON.parse() function is a safe parser: it uses eval() for
* efficiency but first ensures that its argument contains only legal
* JSON literals rather than unrestricted JavaScript code.
*
* This code is derived from the code at http://www.json.org/json.js
* which was written and placed in the public domain by Douglas Crockford.
**/
// This object holds our parse and serialize functions
var JSON = {};
149
s = { // Map type names to functions for serializing those types
'boolean': function (x) { return String(x); },
'null': function (x) { return "null"; },
number: function (x) { return isFinite(x) ? String(x) : 'null'; },
string: function (x) {
if (/["\\\x00-\x1f]/.test(x)) {
x = x.replace(/([\x00-\x1f\\"])/g, function(a, b) {
var c = m[b];
if (c) {
return c;
}
c = b.charCodeAt();
return '\\u00' +
Math.floor(c / 16).toString(16) +
(c % 16).toString(16);
});
}
return '"' + x + '"';
},
array: function (x) {
var a = ['['], b, f, i, l = x.length, v;
for (i = 0; i < l; i += 1) {
v = x[i];
f = s[typeof v];
if (f) {
v = f(v);
if (typeof v == 'string') {
if (b) {
a[a.length] = ',';
}
a[a.length] = v;
b = true;
}
}
}
a[a.length] = ']';
return a.join('');
},
object: function (x) {
if (x) {
if (x instanceof Array) {
return s.array(x);
}
var a = ['{'], b, f, i, v;
for (i in x) {
v = x[i];
f = s[typeof v];
if (f) {
v = f(v);
if (typeof v == 'string') {
if (b) {
The proxy-based implementation of Example A.2 behaves identically to the script-based imple-
mentation in Example 4.8. We could convert the test application of Example 4.7 to use this new
implementation simply by changing this line:
to this:
/**
* metaweb_proxy.js:
*
* This file implements a Metaweb.read() utility function using XMLHttpRequest
This Metaweb.read() implementation is not complete without the mqlread.php proxy script that
it relies on. Example A.3 shows a very simple implementation of such a proxy: it is a PHP script
that simply forwards the URL parameter query to the mqlread service at Metaweb and relies on
the default behavior of the PHP curl_exec() function which directs the response from the for-
warded query to the output stream of the script. Notice that this trivially simple script is not a
fully adequate proxy. For a production web application, you would want to use a fully-fleshed
out proxy.
<?php
$q = str_replace("\\\"", "\"", $_GET["queries"]);
$url = "http://www.freebase.com/api/service/mqlread?queries=" . urlencode($q);
$request = curl_init($url);
curl_setopt($request, CURLOPT_COOKIE, "###freebase.com cookie data here###");
curl_exec($request);
curl_close($request);
?>
<script src="json.js"></script>
<script src="metaweb.js"></script>
<script src="validateAndComplete.js"></script>
<script>
window.onload = function() {
// When the document loads, set up autocompletion for our text fields
Metaweb.addValidationAndCompletion(document.getElementById("country"),
"/location/country",
document.getElementById("countrymsg"));
// Note that we add an additional constraint here
Metaweb.addValidationAndCompletion(document.getElementById("band"),
"/music/artist",
document.getElementById("bandmessage")
/*{genre:"Rock"}*/);
}
</script>
<style>
/* Styles for incomplete and invalid input */
/**
* Add an onchange event handler to the specified textfield object
* to validate the user's input and autocomplete it if necessary.
* The type argument is the Metaweb type, such as "/music/artist"
* for which autocompletion should be done.
*
* The optional message argument specifies a document element into
* which error messages (such as "invalid input" or "incomplete input"
* should be displayed) The optional constraints argument is an object
* that contains additional MQL properties that should be added to the
* query. This can be use to further constrain the autocompletion
* beyond simple type-based autocompletion.
*
* If the callback handler determines that the user's input is invalid or
* incomplete, it sets the cssClass property to "invalid" or "incomplete"
* (overwriting any other class values specified by that property).
* You can define these CSS classes to set background colors or otherwise
* highlight fields that require the user's attention.
*
* Validity and autocompletion is done using the ~= pattern matching
* operator to ask for results of the specified type that begin with
* the specified string. The query requests that results are sorted by
* name and that only the first two are returned. If no results are
* returned, this means that the user's input is invalid. If exactly
* one result is returned, then the user's input is unique and
* autocompletion is performed. If two results are returned, then the
* user's input is not unique, but may still be valid, if the input
// Now submit the query and pass the results to the nested function
Metaweb.read(query, function(results) {
// If there are no results, input is invalid
if (results.length == 0) {
// Set invalid class and display invalid message
textfield.className = "invalid";
if (message) {
message.innerHTML = "invalid input";
message.style.visibility = "visible";
}
}
// If there is one result or if the first result
// matches exactly, input is valid.
else if (results.length == 1 ||
results[0].name.toLowerCase() == input) {
// Autocomplete the value.
// Use capitalization Metaweb returns to us