Musicxml Tutorial PDF
Musicxml Tutorial PDF
1 Tutorial
MusicXML is a digital sheet music interchange and distribution format. The goal is to create a
universal format for common Western music notation, similar to the role that the MP3 format
serves for recorded music. The musical information is designed to be usable by notation
programs, sequencers and other performance programs, music education programs, and music
databases.
The goal of this tutorial is to introduce MusicXML to software developers who are interesting in
reading or writing MusicXML files. MusicXML has many features that are required to support
the demands of professional-level music software. But you do not need to use or understand all
these elements to get started.
MusicXML FAQ
Why did we need a new format? What's behind some of the ways that MusicXML looks and
feels? What software tools can I use? Is MusicXML free?
Notation Basics
Here we discuss the basic notation features that go beyond MIDI's capabilities, including stems,
beams, accidentals, articulations, and directions.
Tablature
Here we describe the basics of tablature notation: specifying strings, frets, string tunings, and
guitar-specific notations like hammer-ons and pull-offs.
Percussion
Here we discuss the steps needed to represent unpitched percussion parts such as drum kits. Some
of these techniques apply to other types of music, such as the use of multiple instruments,
alternate noteheads, and different measure styles.
Is MusicXML free?
Yes. MusicXML 3.1 is developed by the W3C Music Notation Community Group and licensed
under the W3C Community Final Specification Agreement.
See our MusicXML software page at www.musicxml.com/software/ for a more complete list,
including links to each application.
Here it is in MusicXML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-partwise PUBLIC
"-//Recordare//DTD MusicXML 3.1 Partwise//EN"
"http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="3.1">
<part-list>
<score-part id="P1">
<part-name>Music</part-name>
</score-part>
</part-list>
<part id="P1">
<measure number="1">
<attributes>
<divisions>1</divisions>
<key>
<fifths>0</fifths>
</key>
<time>
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>G</sign>
<line>2</line>
</clef>
</attributes>
<note>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>4</duration>
<type>whole</type>
</note>
</measure>
</part>
</score-partwise>
Let's look at each part in turn:
Attributes
The attributes element contains information about time signatures, key signatures, transpositions,
clefs, and other musical data that is usually specified at the beginning of a piece or at the start of a
measure. We discuss the MIDI-compatible elements here; the rest are discussed in the following
sections.
In this example, our Finale translator produces the following MIDI-compatible attributes:
<attributes>
<divisions>24</divisions>
<key>
<fifths>-3</fifths>
<mode>minor</mode>
</key>
<time>
<beats>3</beats>
<beat-type>4</beat-type>
</time>
</attributes>
Divisions
Musical durations are commonly referred to as fractions: whole notes, half notes, quarter notes,
and the like. While each musical note could have a fraction associated with it, MusicXML instead
follows MIDI by specifying the number of divisions per quarter note at the start of a musical part,
and then specifying note durations in terms of these divisions.
MusicXML allows divisions to change in the middle of a part, but most software will probably
find it easiest to compute one divisions value per part and put that at the beginning of the first
Key
Standard key signatures are represented very much like MIDI key signatures. The fifths element
specifies the number of flats or sharps in the key signature - negative for flats, positive for sharps.
The fifths name indicates that this value represents the key signature's position on the circle of
fifths. MusicXML uses the mode element to indicate major or minor key signatures.
Time
Standard time signatures are represented more simply in MusicXML than in MIDI. The beats
element represents the time signature numerator, and the beat-type element represents the time
signature denominator (vs. a log denominator in MIDI).
Transpose
If you are writing a part for a transposing instrument, the transposition must be specified in
MusicXML in order for the sound output to be correct. The transpose element represents what
must be added to the written pitch to get the correct sounding pitch.
The chromatic element, representing the number of chromatic steps to add to the written pitch, is
the one required element. The diatonic, octave-change, and double elements are elements.
Say we have a part written for a trumpet in B-flat. A written "C" on this part will sound as a B-
flat on a piano. This transposition is one diatonic step down (C to B) and two chromatic half steps
down (C to B to B-flat). In MusicXML it would be represented as:
<transpose>
<diatonic>-1</diatonic>
<chromatic>-2</chromatic>
</transpose>
The diatonic element is not needed for correct MIDI output, but it helps get transposition notation
correct and programs are encouraged to use it wherever possible.
The octave-change element is used when transpositions exceed an octave in either direction. The
double element is used when the part should be doubled an octave lower, as when a single part is
used for both cello and string bass.
Pitch
Pitch, duration, ties, and lyrics are all represented within the MusicXML note element. For
example, the E-flat that starts bar 3 in the voice part has the following MIDI-compatible
elements:
<note>
<pitch>
<step>E</step>
<alter>-1</alter>
<octave>5</octave>
</pitch>
<duration>24</duration>
<tie type="start"/>
<lyric>
<syllabic>end</syllabic>
<text>meil</text>
Duration
The duration element is an integer that represents a note's duration in terms of divisions per
quarter note. Since our example has 24 divisions per quarter note in the voice part, a quarter note
has a duration of 24. The eighth-note triplets have a duration of 8, while the eighth notes have a
duration of 12.
Tied Notes
The sounding part of a tied note is indicated by the tie element. The tie element has a type of start
for the starting note of a tie, and a type of stop for the ending note in a tie. A note element can
have two tie elements. If a note is tied to the notes both before and after it, place the tie to the
previous note, <tie type="stop">, before the <tie type="start"> to the next note.
Chords
The duration elements in MusicXML move a musical counter. To play chords, we need to
indicate that a note should start at the same time as the previous note, rather than following the
previous note. To do this in MusicXML, add a chord element to the note.
In our example, the piano part does not have rhythms more complex than eighth notes, so our
converter sets the divisions value to 2. With 2 divisions per quarter note, the sound portion of the
first chord in the piano part is represented as:
<note>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>1</duration>
</note>
<note>
<chord/>
<pitch>
<step>E</step>
Lyrics
While lyrics are not yet used in sound generation, they are included in Standard MIDI files, so we
will discuss them here with the other MIDI-compatible features of MusicXML.
Lyrics in MusicXML use an optional syllabic element to indicate how a syllable fits into a word,
rather than having conventions based on hyphens and spaces as some other formats do. The
values for syllabic can be "single", "begin", "end", or "middle". We saw earlier that the E-flat
starting the third measure had a syllabic value of "end", since "meil" was the end of a two-
syllable word. The "ma" syllable in "image" has a syllabic value of "middle". In the second
measure, the notes are:
<note>
<pitch>
<step>G</step>
<octave>4</octave>
</pitch>
<duration>24</duration>
<lyric>
<syllabic>single</syllabic>
<text>Dans</text>
</lyric>
</note>
<note>
<pitch>
<step>C</step>
<octave>5</octave>
</pitch>
<duration>24</duration>
<lyric>
<syllabic>single</syllabic>
<text>un</text>
</lyric>
</note>
<note>
<pitch>
<step>D</step>
<octave>5</octave>
Multi-Part Music
While monophonic instruments like trumpet, flute, and voice move along one note at a time,
instruments like the piano can have many musical lines at once. Take this simple example from
the first bar of Frederic Chopin's Prelude, Op. 28, No. 20:
Within the piano part, there are two musical lines for the left hand and right hand, represented in
the two staves. On the third beat of the bar, the right hand divides into two lines as well.
We mentioned earlier that the duration element in a note moves the MusicXML musical counter,
and that a chord element keeps this counter from moving further. In order to represent parallel
musical parts, we need to be able to move the musical counter forwards and backwards
independently of notes. This is what the forward and backup elements let us do.
Let's say we have 4 divisions per quarter in this example. We could approach the divided parts in
the right hand in two ways. Finale, for instance, can represent multiple parts using either the layer
feature or the voice 1/voice 2 feature. When using layers, each independent part generally covers
a complete measure. Say that the G and B-natural on beat 3 are in layer 2, with all the other notes
in layer 1. After completing the last chord in layer 1, we would use the following to add the layer
2 notes:
<backup>
<duration>16</duration>
</backup>
<note>
<pitch>
<step>E</step>
<alter>-1</alter>
<octave>4</octave>
</pitch>
<duration>3</duration>
</note>
<note>
Repeats
Repeats and endings are represented by the <repeat> and <ending> elements with a <barline>, as
defined in the barline.mod file.
In regular measures, there is no need to include the <barline> element. It is only needed to
represent repeats, endings, and graphical styles such as double barlines.
A forward repeat mark is represented by a left barline at the beginning of the measure (following
the attributes element, if there is one):
<barline location="left">
<bar-style>heavy-light</bar-style>
<repeat direction="forward"/>
</barline>
The repeat element is what is used for sound generation; the bar-style element only indicates
graphic appearance.
Similarly, a backward repeat mark is represented by a right barline at the end of the measure:
<barline location="right">
<bar-style>light-heavy</bar-style>
<repeat direction="backward"/>
</barline>
Sound Suggestions
Musical scores abound with ambiguous notations for dynamics, tempo, and other musical
elements. To automatically generate a MIDI or other sound file, some value must be used for
dynamics or tempo. MusicXML defaults to a MIDI velocity of 90 (roughly a forte); no default
tempo is specified.
Sound suggestion elements and attributes guide the creation of a sound file. Most of the sound
suggestions are found in the sound element, defined in the direction.mod file. Tempo is specified
in quarter notes per minute. Dynamics are specified as a percentage of the standard MusicXML
forte velocity. For example, the following sound element specifies a tempo of quarter note = 88
with a MIDI velocity of 64:
<sound tempo="88" dynamics="71"/>
The other attributes for the sound element are described in the direction.mod file. Sound
suggestions are also available for grace notes (the steal-time-previous, steal-time-following, and
make-time attributes, defined in the note.mod file) and for ornaments (the trill-sound entity for
trills and other ornaments, defined in the common.mod file).
Clearly our discussion of the MIDI-compatible portion of MusicXML left out many things
represented in this music. Where are the tempo and dynamic markings: the Andantino, pp, dolce,
crescendo and diminuendo wedges? Where are stem directions stored? The downstem on the
initial G in the voice part is not what many programs would default to. How is the beaming
represented, so that all the eighth notes are beamed together in the piano part, but separated into
triplets in the voice part? How are the piano chords split between staves? How are accidentals
indicated, including courtesy accidentals like the A-flat in the fourth bar?
A fundamental part of MusicXML is the distinction between elements that primarily represent the
sound of the music versus those that represent its appearance. We discussed the sound elements in
the previous section, and they are of great use to applications dealing with MIDI or other sound
files. Now we discuss the elements for musical appearance, which are of great use to music
notation applications.
Here is what the beginning of the voice part looks like for "Après un rêve," up to the end of the
first measure:
<part id="P1">
<measure number="1">
<attributes>
<divisions>24</divisions>
<key>
<fifths>-3</fifths>
<mode>minor</mode>
</key>
<time>
<beats>3</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>G</sign>
<line>2</line>
</clef>
Attributes
Staves
The staves element indicates the number of staves in a musical part, which in this case is 2 staves
for the piano part. The staves element is optional. If it is not present, as is the case in the voice
part, there is 1 staff for the part.
Clef
The clef element is used to indicate the clef for the staff. By specifying the clef's sign and its line,
MusicXML handles both the common treble and bass clefs along with tenor, alto, percussion, tab,
and older clefs. The treble clef definition indicates that the second line from the bottom of the
staff is a G; the bass clef definition indicates that the fourth line from the bottom of the staff is an
F. The number attribute indicates the staff number if the part has more than one staff.
The clef element may also contain a clef-octave-change element after the line element. This is
used for clefs that are written either an octave higher or lower than sounding pitch. For example,
the tenor line in choral music is usually written in treble clef, an octave higher than the notes
actually sound.
The clef for this part would be represented as:
<clef>
<sign>G</sign>
<line>2</line>
Time
To represent common and cut time signatures, use the symbol attribute of the time element.
For common time, use:
<time symbol="common">
<beats>4</beats>
<beat-type>4</beat-type>
</time>
For cut time, use:
<time symbol="cut">
<beats>2</beats>
<beat-type>2</beat-type>
</time>
Without the symbol attribute, these time signatures would appear as 4/4 and 2/2, respectively.
Musical Directions
Musical directions are used for the expression marks in a musical score that are not clearly tied to
a particular note. The beginning of the voice part in measure 2, for instance, looks like this:
<direction placement="above">
<direction-type>
<words default-x="15" default-y="15"
font-size="9" font-style="italic">dolce</words>
</direction-type>
</direction>
<note default-x="27">
<pitch>
<step>G</step>
<octave>4</octave>
</pitch>
<duration>24</duration>
<voice>1</voice>
<type>quarter</type>
<stem default-y="6">down</stem>
<lyric default-y="-80" number="1">
<syllabic>single</syllabic>
<text>Dans</text>
</lyric>
</note>
<direction placement="above">
<direction-type>
<wedge default-y="20" type="crescendo"/>
</direction-type>
<offset>-8</offset>
</direction>
This indicates that the "dolce" mark starts a little more than one space before the first note in a 9
point italic font, while the crescendo wedge starts two-thirds of the way between the first and
second notes in the measure. The placement attribute is used to indicate whether the directions go
Note Appearance
Symbolic Note Types
Given the duration of a note and the divisions attribute, a program can usually infer the symbolic
note type (e.g. quarter note, dotted-eighth note). However, it is much easier for notation programs
if this is represented explicitly, rather than making the program infer the correct symbolic value.
In some cases, the intended note duration does not match what is written, be it some of Bach's
dotted notations, notes inégales, or jazz swing rhythms.
The type element is used to indicate the symbolic note type, such as quarter, eighth, or 16th.
MusicXML symbolic note types range from 1024th notes to maxima notes: 1024th, 512th, 256th,
128th, 64th, 32nd, 16th, eighth, quarter, half, whole, breve, long, and maxima. The type element
may be followed by one or more empty dot elements to indicate dotted notes.
Tuplets
The time-modification element is used to make it easier for applications to handle tuplets
properly. For a normal triplet, this would look like:
<time-modification>
<actual-notes>3</actual-notes>
<normal-notes>2</normal-notes>
</time-modification>
This indicates that three notes are placed in the time usually allotted for two notes.
There is an optional normal-type element that is used when the type of the note does not match
the type of the normal-notes in the triplet. Say you have an eighth note triplet, but instead of three
eighth notes, you have a quarter note and eighth note instead. Without a normal-type element,
software that reads the quarter note in the tuplet will likely assume that this is starting a quarter-
note triplet, not an eighth note triplet. In this case, the symbolic type and tuplet would be encoded
as:
<type>quarter</type>
<time-modification>
<actual-notes>3</actual-notes>
<normal-notes>2</normal-notes>
Stems
Stem direction is represented with the stem element, whose value can be up, down, none, or
double. For up and down stems, the default-y attribute represents where the stem ends, measured
in tenths of interline space from the top line of the staff.
Beams
Beams are represented by beam elements. Their value can be begin, continue, end, forward hook,
and backward hook. Each element has a beam-level attribute which ranges from 1 to 8 for eighth-
note to 256th-note beams.
Accidentals
The accidental element represents actual notated accidentals. The most common values are sharp,
flat, natural, double-sharp, and flat-flat. Many microtonal accidental values are also available. An
accidental element has optional courtesy and editorial attributes to indicate courtesy and editorial
accidentals. The bracket, parentheses, and size attributes offer more precise visual representations
for these types of accidentals.
Notations
Many additional elements can be associated with a note. In MusicXML, these are collected under
the notations object. Tied notes, slurs, tuplets, fermatas, and arpeggios are represented by top-
level children of the notations element. Dynamics, ornaments, articulations, and technical
indications specific to particular instruments are also top-level children of the notations element.
A staccato mark would then be placed within the articulations element.
The tied element represents the visual part of a tie, and the tuplet element represents the visual
part of a tuplet. The tie element affects the sound, and the time-modification affects placement,
but the tied and tuplet elements indicate that there is something to see on the score indicating the
tie or tuplet. (With ties, the two nearly always go together, but with tuplets this is not the case.)
The second E-flat in measure 3 of the voice part, which is the end of a tie and start of a tuplet, is
represented as:
<note>
<pitch>
<step>E</step>
<alter>-1</alter>
<octave>5</octave>
</pitch>
<duration>4</duration>
<tie type="stop"/>
<voice>1</voice>
<type>eighth</type>
<time-modification>
<actual-notes>3</actual-notes>
<normal-notes>2</normal-notes>
Multi-Part Music
MusicXML contains two elements to help distinguish what is happening in multi-part music: the
voice and staff elements.
A staff element should be used wherever possible in multi-staff music like piano parts. Note,
forward, and direction elements can all include a staff element. The first cross-staff chord in
measure 3 of the piano part is represented as:
<note default-x="26">
<pitch>
<step>A</step>
<octave>3</octave>
</pitch>
<duration>1</duration>
<voice>1</voice>
<type>eighth</type>
<accidental>natural</accidental>
<stem default-y="91">up</stem>
<staff>2</staff>
<beam number="1">begin</beam>
</note>
<note default-x="26">
<chord/>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>1</duration>
<voice>1</voice>
<type>eighth</type>
<stem>up</stem>
<staff>2</staff>
</note>
<note default-x="26">
<chord/>
<pitch>
<step>E</step>
<alter>-1</alter>
<octave>4</octave>
</pitch>
<duration>1</duration>
<voice>1</voice>
<type>eighth</type>
Chord Symbols
Here is a three-bar example of a simple lead sheet. It contains the melody together with chord
symbols and diagrams for how to play the chords on a guitar:
The first chord is a G major sixth chord with the fifth (D) in the bass. The second chord is notated
as an A major chord with an added ninth degree. Another analysis might be to call it a dominant
ninth chord with a missing seventh degree. MusicXML supports both types of analysis. For this
example, we follow the written chord diagram notation. The third chord, an A11, will be
discussed in the chord diagram section, as it includes both fingerings and a barre symbol.
Here is how the first G6 chord symbol is represented in the MusicXML file, omitting the chord
diagram for the time being:
<harmony default-y="100">
<root>
<root-step>G</root-step>
</root>
<kind halign="center" text="6">major-sixth</kind>
<bass>
<bass-step>D</bass-step>
</bass>
</harmony>
Each chord symbol has at least two elements: a <root> element to indicate the root of the chord,
and a kind element to indicate the type of the chord. Here, we have a root of G and a kind of
major-sixth. MusicXML 3.1 supports 33 different kind elements, listed in the direction.mod file.
The kind element has a text attribute that indicates that the chord is displayed as G6, not as
Gmaj6, GM6, or other spelling that could represent the same chord. This symbol also indicates
the bass of the chord, represented using the bass element.
Both the root and the bass element divide the pitch into step and alter elements, similar to how the
pitch element works. The root element uses the root-step and root-alter elements, while the bass
element uses the bass-step and bass-alter elements. There is no element that corresponds to the
octave element for pitch, since this information is not considered part of the harmonic analysis or
the chord symbol.
Chord Diagrams
Chord diagrams, also known as chord frames, are used to indicate how a chord is played on a
fretted instrument such as the guitar. The vertical lines in the chord diagrams represent strings,
while the horizontal spaces represent frets. An x above a string indicates that the string is muted,
while an o above a string represents an open string.
MusicXML uses the frame element to represent chord diagrams. Let us look at the harmony
element for the first G6 chord again, this time including the frame element for the chord diagram:
<harmony default-y="100">
<root>
<root-step>G</root-step>
</root>
<kind halign="center" text="6">major-sixth</kind>
<bass>
<bass-step>D</bass-step>
</bass>
<frame default-y="83" halign="center"
relative-x="5" unplayed="x" valign="top">
<frame-strings>6</frame-strings>
<frame-frets>5</frame-frets>
<frame-note>
<string>5</string>
<fret>5</fret>
</frame-note>
<frame-note>
<string>4</string>
<fret>5</fret>
</frame-note>
<frame-note>
<string>3</string>
<fret>4</fret>
The fret and string information needed to generate tablature for guitar and other stringed
instruments is handled the same way as technical indications for other instruments, such as piano
fingerings and violin bowings. The technical element contains these types of notations, and two
of its component elements represent the note's fret and string. Frets are numbered starting at 0 for
the open string. Strings are numbered starting at 1 for the highest string on the instrument.
The fret and string for first note in this example are represented using:
<technical>
<string>3</string>
<fret>5</fret>
</technical>
String Tuning
An attributes element may include a staff-details element to specify the details of a tab staff. The
staff-lines element specifies the number of lines on a tablature staff, usually one for each string.
Staff tunings are described with the staff-tuning and capo elements. TAB is one of the values
available for clef elements. The print-object attribute of the key element is used to indicate that
the key signature should not be displayed on this staff.
The tab part in our example begins with the following attributes:
<attributes>
<divisions>2</divisions>
<key print-object="no">
<fifths>0</fifths>
<mode>major</mode>
</key>
<clef>
<sign>TAB</sign>
<line>5</line>
Unpitched Notes
To illustrate MusicXML's percussion features, here's a two-bar example for two players: one on a
drum kit, and one on cowbell:
In the drum part, the top space (B in bass clef) is used for the cymbal (diamond notehead) and hi-
hat (x notehead). The E space is used for the snare drum, and the bottom A space is used for the
bass drum. The cowbell player has only one instrument, so it is represented on a one-line staff.
Since these notes have no definite pitch, it would be misleading to represent them using the pitch
element. An analysis program looking for a series of repeated B's should not return this piece of
music. Neither should a program looking for a series of repeated F-sharps, based on the General
MIDI pitch for a closed hi-hat.
Instead, we represent percussion and other unpitched instruments with the unpitched element. As
with rests, there are display-step and display-octave elements to indicate where the note should
appear on the staff, based on the current clef. So the cymbal and hi-hat notes on this bass clef
staff are represented as:
<unpitched>
<display-step>B</display-step>
<display-octave>3</display-octave>
</unpitched>
Staff Lines
The cowbell part is an example where one player has one instrument, so the part is notated on one
line both for clarity and saving space. The single line is specified within the attributes element for
the part:
<attributes>
<divisions>2</divisions>
<key>
<fifths>0</fifths>
<mode>major</mode>
</key>
<time symbol="common">
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>percussion</sign>
</clef>
<staff-details>
<staff-lines>1</staff-lines>
</staff-details>
</attributes>
How do we determine the display-step and display-octave for a one-line staff? Since the
percussion clef is treated like a treble clef, the G in octave 4 is on the second line of the staff. For
this one-line staff, that G is one imaginary line above the staff (for a 2-line staff, it's the top line,
and the middle line on a 3-line staff). Therefore the unpitched elements for the cowbell part are:
<unpitched>
<display-step>E</display-step>
<display-octave>4</display-octave>
</unpitched>
Notehead Shapes
The one remaining task is to specify the alternate noteheads that distinguish the hi-hat from the
cymbal. While the MusicXML playback can distinguish these by the use of different instruments,
a drummer will certainly appreciate having different notehead shapes for different instrument on
the same line.
The MusicXML notehead element specifies these different shapes. Values can be slash, triangle,
diamond, square, cross, x, circle-x, inverted triangle, arrow down, arrow up, circled, slashed, back
slashed, normal, cluster, circle dot, left triangle, rectangle, none, do, re, mi, fa, fa up, so, la, ti, and
other. Enclosed shapes like normal, diamond, triangle, and square can use the filled attribute to
indicate a filled or hollow shape.
The first two notes of the cymbal/hi-hat line look like this:
<note default-x="78">
<unpitched>
<display-step>B</display-step>
<display-octave>3</display-octave>
</unpitched>
<duration>1</duration>
<instrument id="P1-X13"/>
<voice>1</voice>
<type>eighth</type>
<stem default-y="40">up</stem>
<notehead filled="no">diamond</notehead>
<beam number="1">begin</beam>
</note>
<note default-x="109">
<unpitched>
<display-step>B</display-step>
<display-octave>3</display-octave>
</unpitched>
<duration>1</duration>
<instrument id="P1-X6"/>
<voice>1</voice>
<type>eighth</type>
<stem default-y="40">up</stem>
<notehead>x</notehead>
<beam number="1">continue</beam>
</note>
The filled attribute on the notehead element is also useful for multi-part piano music. There are
many cases where, for instance, a downstem half note shares a hollow notehead with an upstem
eighth note. The eighth note can specify that it uses an unfilled normal notehead, making things
display correctly when moving back and forth between notation programs.
The container element is the document element for this file. It contains a single rootfiles element.
The rootfiles element in turn contains one or more rootfile elements. The MusicXML root must
be described in the first rootfile element. The full-path attribute provides the path relative to the
root folder of the zip file. A MusicXML file used as a rootfile may have score-partwise, score-
timewise, or opus as its document element.
Additional rootfile elements can describe alternate versions such as PDF and audio files. Multiple
rootfile elements can be distinguished by each one having a different media-type attribute value.
For instance, if the Dichterliebe01.mxl file contained a PDF rendition as well as the MusicXML
file, the container.xml file would look like this:
<?xml version="1.0" encoding="UTF-8">
<container>
<rootfiles>
<rootfile full-path="Dichterliebe01.xml"
media-type="application/vnd.recordare.musicxml+xml"/>
<rootfile full-path="Dichterliebe01.pdf"
media-type="application/pdf"/>
</rootfiles>
</container>
If no media-type value is present, a MusicXML file is assumed. However, if multiple rootfiles are
present, it will probably clarify things to include the media-type attribute for all rootfile elements.
The zip archive can also include images that are referenced with the MusicXML image and
credit-image elements, or other media that are referenced using the MusicXML link element.
MusicXML 3.1 does not specify where these files need to be located in the zip file, but it is
probably least confusing if images or other files of a similar type are grouped together in a
subfolder within the zip archive.
The first file in the zip container should be a file named mimetype. The contents of this file
should be the MIME media type string
application/vnd.recordare.musicxml
The mimetype file should be encoded in US-ASCII and must not be compressed or encrypted,
and there must not be an extra field in its zip header. The contents of the mimetype file must not
contain any laeding padding or white space, and must not begin with a byte order mark.
The mimetype file helps applications identify a compressed MusicXML file by reading the first
few bytes of the file. EPUB and other zip-based formats also use a mimetype file for the same
reason. Older versions of MusicXML before version 3.1 did not specify this mimetype file, so
applications may still see containers without it.