0% found this document useful (0 votes)
169 views51 pages

Matroska-File Format

The document describes the Matroska file format. It begins by explaining the basics of EBML, the binary format used for storing metadata in Matroska files. EBML uses variable-length integer values (vint and svint) to store numbers in a space-efficient manner. It then outlines the top-level elements - EBML and Segment - that make up a Matroska file and describes some of the common elements contained within a Segment, such as tracks, clusters, cues and tags.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views51 pages

Matroska-File Format

The document describes the Matroska file format. It begins by explaining the basics of EBML, the binary format used for storing metadata in Matroska files. EBML uses variable-length integer values (vint and svint) to store numbers in a space-efficient manner. It then outlines the top-level elements - EBML and Segment - that make up a Matroska file and describes some of the common elements contained within a Segment, such as tracks, clusters, cues and tags.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

M ATROSKA file format

Matroska File Format


(under construction!)
Alexander Noé
[email protected]
Last change: June 24, 2007

Contents
1 Introduction 4

2 EBML - basics 6
2.1 Unsigned Integer Values of Variable Length (”vint“) . . . . . . . . . 6
2.2 EBML elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Signed Integer Values of Variable Length (svint) . . . . . . . . . . . 7
2.4 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 M ATROSKA files - Top-Level elements 9


3.1 EBML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 EBML - The EBML file header 10

5 Level 1 - Elements inside Segments 12


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 SegmentInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 SeekHead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4 Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.5 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.6 Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.7 Chapters - Editions and ChapterAtoms . . . . . . . . . . . . . . . . 32
5.8 Attachments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.9 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

1
M ATROSKA file format

6 M ATROSKA block Layout and Lacing 42


6.1 Basic layout of a Block . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.2 Lacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7 Overhead of M ATROSKA files 44


7.1 Overhead of B LOCK G ROUPS . . . . . . . . . . . . . . . . . . . . . . 44
7.2 Overhead of C LUSTERs . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3 Overhead caused by Cues . . . . . . . . . . . . . . . . . . . . . . . 50

8 Links 51

List of Tables
1 EBML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 SegmentInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 SeekHead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Seek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 TrackEntry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
9 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
10 ContentEncodings . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11 ContentEncoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
12 ContentCompression . . . . . . . . . . . . . . . . . . . . . . . . . . 24
16 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
17 BlockGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
18 Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
19 CuePoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
20 CueTrackPositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
21 Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
22 EditionEntry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
23 ChapterAtom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2
M ATROSKA file format

24 ChapterTracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
25 ChapterDisplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
26 Attachments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
27 AttachedFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
28 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
29 Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
30 Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
31 SimpleTag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3
M ATROSKA file format

1 Introduction

This document is intended to be used by developers who want to implement sup-


port for the M ATROSKA file format in their applications, but who want to build this
support from scratch rather than using existing implementations, or people who
just want to understand the M ATROSKA file format in detail. Thus, the file format
itself is described, the usage of existing libraries isn’t.
This document does not replace the official documentation1 . It is less condensed,
but not necessarily complete. Especially, in the case that M ATROSKA supports Dig-
ital Restrictions Management one day, I will expressively not document that part.
Also, typos in element IDs are never impossible.
When speaking about element occurence, elements can be mandatory or not, ele-
ments may be present several times inside a parent element or not etc. Occurence
restrictions will be indicated using expressions like = 1 or ≥ 1 etc. Those re-
strictions will exclude cases which do not technically render a file unusable or
ambigous, but which are unreasonable, like a file with no S EGMENT UID, see sec-
tion 5.2. The same way it would be weird (but not make a file unusable) to have a
C HAPTERS element (which is supposed to describes chapters) which is empty. An
element that must occur at least once is a reasonable file is called “mandatory”.
When an element is really mandatory, i.e the file or a part of it is useless when it’s
missing, it will be labeled as ≥ 1 (!) or = 1 (!). An example would be the codec ID
of a track, without which a track cannot be decoded at all.
The official Matroska specification pages use the following interpretation of “manda-
tory” and “default”: When an element has a default value that is used if the ele-
ment itself is not present, the value cannot be missing, thus the element is inher-
ently mandatory. This interpretation of “mandatory” being weird, this document
considers an element mandatory when it must be physically present in the file.
Also, default values can only be valid values. Consequently, a mandatory element
cannot have a default value because if it had one, it couldn’t be mandatory any-
more.
In this document, element names are always printed like T HIS, element values are
printed like $T HIS, as in “if $T HIS F LAG=1, ...”.
If you have any questions concerning this document, if you have comments, addi-
tions, if you have found an error, or if you want to contact me for whatever reason,
1
http://www.matroska.org/technical/specs/index.html

4
M ATROSKA file format

please send me an e-mail (include ’matroska’ in the topic!). You can contact me in
german, english or french, whatever you prefer. Just don’t ask me if you can ask
something or if I could document some Digital Restrictions Management.
This document is powered by LaTeX, so changing the order of certain tables or the
style of those tables etc. is, with certain limits, possible within a few seconds.
Screenshots of real life file structures are used to illustrate the file structure. All of
them have been made using the EBML Tree Viewer in AVI-Mux GUI.

5
M ATROSKA file format

2 EBML - basics

EBML files use integers of variable size. This way, the file format doesn’t waste
space with storing 32 or even 64 bit integers in placed where they might sometimes
occur. The way the size is coded is inspired by the UTF-8 encoding format.

2.1 Unsigned Integer Values of Variable Length (”vint“)

The length of an integer is equivalent to length = 1 + [number_of _leading_zero_bits].


All integers use big endian. You could use more than 7 leading zeros, then the first
byte would be 0x00, however, this would only be needed if integers longer than
56 bits are required. This is forbidden in M ATROSKA files.
Example: 3A 41 FE:
The first byte 3A (0011 1010) has 2 leading zeros, resulting in a total length of 3
bytes. The first ‘1’ in the byte (0011 1010) is just needed to finish the sequence of
leading zeros and can’t be used to store the value either. Thus, it is reset to obtain
the value this byte sequence represents. The result is then 0 X 1A41FE. As you can
see, you lose one bit per byte to know how long a number is, and you can use 7
bits per byte to store the integer’s value itself.
Of course, the value 0x1A41FE could also be written as 10 1A 41 FE or 08 00 1A
41 FE (do the decoding on a piece of paper if it’s not clear), however, when writing
EBML files, the shortest possible encoding should be used to avoid wasting space,
which is the very point of this coding scheme.
Unknown Length
All bits after the leading zeros being set to one, such as FF or 7F FF, indicates an
unknown length. Muxers shall avoid writing unknown length values whenever
possible. The only exception is the last Level 0 element of a file. If encoding a
number as described above results in such a sequence, it must be encoded again
with a greater destination length. Example: When encoding 16383 as described
above, the result is 7F FF. In 7F FF, all bits after the leading zero are set, which
would indicate an unknown length. That means, the length is increased to 3, and
the number is encoded again to 20 3F FF.
Note
It is possible to use a lookup table to determine the total length from the first byte.

6
M ATROSKA file format

The Matroska file format does not allow integer lengths greater than 8, meaning
that the number of leading zeros is not higher than 7 and that the total length can
always be retrieved from the first byte.

2.2 EBML elements

One piece of information is stored the following way:

typedef struct {
vint ID // EBML-ID
vint size // size of element
char[size] data // data
} EBML_ELEMENT;

The length of ID shall be called s_ID, the length of size shall be called s_size.
Elements that contain other EBML Elements are called EBML Master elements.
Generally, the order of EBML elements inside a parent element is not fixed. In
some cases, a certain order is recommended, but it is never mandatory. Especially,
no element order should be assumed inside small parent elements.

2.3 Signed Integer Values of Variable Length (svint)

Signed integers have the following value: Read the integer as Unsigned Integer
and then subtract

vsint_subtr[length-1]

where

__int64 vsint_subtr [] =
{ 0x3F, 0x1FFF, 0x0FFFFF, 0x07FFFFFF,
0x03FFFFFFFF, 0x01FFFFFFFFFF,
0x00FFFFFFFFFFFFFF, 0x007FFFFFFFFFFFFF };

7
M ATROSKA file format

2.4 Data Types

Whereas vints are used in the header section of EBML elements, the data types
describes in this section occur in the data section.

2.4.1 Signed and Unsigned Integers (int and uint)

Integers, signed as well as unsigned, are stored in big endian byte order, with
leading 0x00 (in case of positive values) and 0xFF (in case of negative values)
being cut off (example for int: -257 is 0xFE 0xFF). An int/uint may not be larger
than 8 bytes.

2.4.2 Float

A Float value is a 32 or 64 bit real number, as defined in IEEE. 80 Bit values have
been in the specification, but have been removed and should not be used. The
bytes are stored in big endian order.

2.4.3 Types of Strings

String refers to an ASCII string.


UTF-8 refers to a string that is encoded as UTF-8

8
M ATROSKA file format

3 M ATROSKA files - Top-Level elements

M ATROSKA files only have two different top level elements:

3.1 EBML

This header describes the contents of an EBML file. There should be only one
EBML header in one file. Any further EBML headers do not render a file invalid,
but shall be ignored by any application reading the file. Files with more than one
EBML header could be created for instance if two or more files are appended by
using the copy /b command.

3.2 Segment

A S EGMENT contains multimedia data, as well as any header data necessary for re-
play. There can be several S EGMENTs in one M ATROSKA file, but this is not encour-
aged to be done, as not many tools are able to handle multisegment M ATROSKA
files correctly. If you want to replay multisegment M ATROSKA files on Windows,
please use Haali Media splitter2

2
http://haali.cs.msu.ru/mkv/

9
M ATROSKA file format

4 EBML - The EBML file header

The EBML top level element contains a description of the file type, such as EBML
version, file type name, file type version etc.

Obviously, this header being missing makes it necessary to guess the file type.

Table 1: The EBML element (Top-Level)


Element Description
uint, # ≤ 1 indicates the version of the EBML Writer that has been used
EBMLV ERSION to create a file
ID: 42 86
def: 1
uint, # ≤ 1 indicates the minimum version an EBML parser needs to be
EBMLR EAD V ERSION compliant with to be able to read the file
ID: 42 F7
def: 1
EBML continued on next page

10
M ATROSKA file format

Element Description
uint, # ≤ 1 indicates the length of the longest EBML-ID the file contains.
EBMLM AX IDL ENGTH In case of matroska, this value is 4. Any EBML-ID which is
ID: 42 F2 longer than the value of this element shall be considered
def: 4 invalid.
uint, # ≤ 1 indicates the maximum s_size value the file contains. Any
EBMLM AX S IZE L ENGTH EBML element having an s_size value greater than EBML-
ID: 42 F3 MaxSizeLength shouldl be considered invalid.
def: 8
string, # ≤ 1 describes the contents of the file. In the case of a M ATROSKA
D OC T YPE file, its value is 'matroska'
ID: 42 82
def: matroska
uint, # ≤ 1 indicates the version of the $D OC T YPE writer used to create
D OC T YPE V ERSION the file
ID: 42 87
def: 1
uint, # ≤ 1 indicates the minimum version number a $D OC T YPE parser
D OC T YPE R EAD V ERSION must be compliant with to read the file.
ID: 42 85
def: 1
Index →page 2 end of EBML

As you can see, in the case of Matroska files all child elements of the EBML el-
ement have a default value. Thus, an empty EBML element would technically
introduce a Matroska file (with file type version 1, maximum ID length 4, max-
imum size length 8 etc.) correctly. However, I don’t recommend to push the
specifications like this.
It is not recommended to use either IDs or size values greater than 8 bytes. While
it’s clear that 8 bytes are enough to represent any size of anything on any hard
disc, one might think about using IDs larger than 8 bytes. However, since the ID is
considered an integer, treating IDs larger than 8 bytes is difficult on current CPUs,
which are limited to 64 bit for simple integer operations.

11
M ATROSKA file format

5 Level 1 - Elements inside Segments

5.1 Overview

Table 2: The S EGMENT element (Top-Level)


Element Description
Master, # = 1 S EGMENT I NFO contains general information about a seg-
S EGMENT I NFO (→3) ment, like an UID, a title etc. This information is not really
ID: 15 49 A9 66 required for playback, but should be there (→ section 5.2).
Master, # ≥ 0 A S EEK H EAD is an index of elements that are children of
S EEK H EAD (→4) S EGMENT. It can point to other S EEK H EADs, but not to itself.
ID: 11 4D 9B 74 If all non-C LUSTER precede all C LUSTERs (→ section 5.5),
a S EEK H EAD is not really necessary, otherwise, a missing
S EEK H EAD leads to long file loading times or the inability to
access certain data.
S EGMENT continued on next page

12
M ATROSKA file format

Element Description
Master, # ≥ 0 A C LUSTER contains video, audio and subtitle data. Note
C LUSTER (→16) that a M ATROSKA file could contain chapter data or attach-
ID: 1F 43 B6 75 ments, but no multimedia data, so C LUSTER is not a manda-
tory element.
Master, # ≥ 0 A T RACKS element contains the description of some or all
T RACKS (→6) tracks (preferably all). This element can be repeated once in
ID: 16 54 AE 6B a while for backup purposes. A file containing only chapters
and attachments does not have a T RACKS element, thus it’s
not mandatory.
Master, # ≤ 1 The C UES element contains a timestamp-wise index to
C UES (→18) C LUSTERs, thus it’s helpful for easy and quick seeking.
ID: 1C 53 BB 6B
Master, # ≤ 1 The ATTACHMENTS element contains all files attached to this
ATTACHMENTS (→26) S EGMENT.
ID: 19 41 A4 69
Master, # = 1 The C HAPTERS elements contains the definition of all chap-
C HAPTERS (→21) ters and editions of this S EGMENT
ID: 10 43 A7 70
Master, # ≤ 1 The TAGS element contains further information about the
TAGS (→28) S EGMENT or elements inside the S EGMENT that is not really
ID: 12 54 C3 67 required for playback.
Index →page 2 end of S EGMENT

5.2 SegmentInfo

The S EGMENT I NFO element contains general information about the S EGMENT,
such as its duration, the application used for writing the file, date of creation, a
unique 128 bit ID, to name a few only. Information included in the S EGMENT I NFO
element is not required for playback, but should be written by any M ATROSKA
muxer.

13
M ATROSKA file format

(read: <element name> (<s_size + s_ID>: <size> bytes at <position in file>: value)

Table 3: The S EGMENT I NFO element, child of S EGMENT (→2)


Element Description
char[16], # = 1 a unique 128 bit number identifying a S EGMENT. Obviously,
S EGMENT UID a file can only be referred to by another file if a S EGMEN -
ID: 73 A4 T UID is present, however, playback is possible without that
UID.
utf-8, # ≤ 1 contains the name of the file the S EGMENT is stored in. Since
S EGMENT F ILENAME renaming files is easy, the reliability of this element’s value
ID: 73 84 should not be overrated.
char[16], # ≤ 1 contains the unique 128 bit ID of the S EGMENT that is re-
P REV UID played before the currently active S EGMENT, i.e. the ID of
ID: 3C B9 23 the S EGMENT that should be loaded if the user tries to seek
to a timecode earlier than the earliest timecode of the ac-
tive S EGMENT. That SEGMENT should, of course, be easy to
locate, for instance in a file in the same directory.
S EGMENT I NFO continued on next page

14
M ATROSKA file format

Element Description
utf-8, # ≤ 1 contains the name of the file in which the S EGMENT having
P REV F ILENAME the ID $P REV UID is stored. P REV F ILENAME should not be
ID: 3C 83 AB considered reliable for the same reason as S EGMENT F ILE -
NAME , however, it could be the first filename the player is
looking for when the SEGMENT described in P REV UID is
needed
char[16], # ≤ 1 contains the unique 128 bit ID of the S EGMENT that is re-
N EXT UID played after the currently active S EGMENT, i.e. the ID of the
ID: 3E B9 23 S EGMENT that should be loaded if the user tries to seek to
a timecode after the end of the active S EGMENT. Like P RE -
V UID, the corresponding S EGMENT should be easy to locate.
utf-8, # ≤ 1 contains the name of the file in which the S EGMENT having
N EXT F ILENAME the ID $N EXT UID is stored. N EXT F ILENAME shall not be
ID: 3E 83 BB considered reliable for the same reason as S EGMENT F ILE -
NAME .
uint, # ≤ 1 Each scaled timecode in a M ATROSKA file is multiplied by
T IMECODE S CALE T IMECODE S CALE to obtain a timecode in nanoseconds. Note
ID: 2A D7 B1 that not all timecodes are scaled!
float, # ≤ 1 The D URATION indicates the duration of the S EGMENT. The
D URATION duration measured in nanoseconds is scaled and is thus
ID: 44 89 equal to $D URATION * $T IMECODE S CALE. This element
should be written.
utf-8, # ≤ 1 Contains a general name of the S EGMENT, like Lord of
T ITLE the Rings - The Two Towers. No language can be at-
ID: 7B A9 tached to the title, however, Tags (→ section 5.9) could be
used to define several titles for a segment. This is not yet
commonly done, though.
string, # = 1 contains the name of the library that has been used to create
M UXING A PP the file (like ”libmatroska 0.7.0“). This element should be
ID: 4D 80 written by any muxer! Especially if non-compliant files are
encountered, this help to know who must be blamed for that
file.
utf-8, # = 1 contains the name of the application used to create the file
W RITING A PP (like ”mkvmerge 0.8.1“). This element should be written for
ID: 57 41 the same reason as M UXING A PP.
S EGMENT I NFO continued on next page

15
M ATROSKA file format

Element Description
int, # ≤ 1 contains the production date, measured in nanoseconds rel-
D ATE UTC atively to Jan 01, 2001, 0:00:00 GMT+0h
ID: 44 61
Index →page 2 end of S EGMENT I NFO

5.3 SeekHead

The S EEK H EAD element contains a list of positions of Level 1 elements in the
S EGMENT. Each pair (element id, position) is stored in one S EEK element:

Table 4: The S EEK H EAD element, child of S EGMENT (→2)


Element Description
Master, # ≥ 1 One S EEK element contains an EBML-ID and the position
S EEK (→5) within the S EGMENT at which an element with this ID can
ID: 4D BB be found.
Index →page 2 end of S EEK H EAD

Table 5: The S EEK element, child of S EEK H EAD (→4)


Element Description
uint, # = 1 The S EEK ID element contains the EBML-ID of the element
S EEK ID found at the given position
ID: 53 AB
uint, # = 1 The S EEK P OSITION element contains the position relatively
S EEK P OSITION to the S EGMENT’s data at which an element with the ID
ID: 53 AC $S EEK ID can be found.
Index →page 2 end of S EEK

Not all Level 1 elements need to be included. Typical S EEK H EADs either include
a list of all Level 1 elements, or a list of all Level 1 elements except for C LUSTERs
(→ section 5.5). S EEK H EADs can also include references to other S EEK H EADs if
there is, for example, a small S EEK H EAD at the beginning of the file and a larger
one at its end.

16
M ATROSKA file format

The following picture illustrates the S EEK H EAD element in a real file. Note that
the EBML Tree Viewer replaced Level 1 IDs in S EEK ID with their human-readable
name:

17
M ATROSKA file format

5.4 Tracks

The T RACKS element contains information about the tracks that are stored in the
S EGMENT, like track type (audio, video, subtitles), the used codec, resolution and
sample rate. All tracks shall be described in one (or more, but preferably only one)
T RACKS element.
Each track is described in one T RACK E NTRY. Theoretically, using the T RACKUID,
information about one track could be spread over different T RACK E NTRYs, the UID
would allow to know which track the information applies to, however, it is highly
discouraged to stretch the specification like this.
Also, an empty T RACKS element would be rather useless, but should not lead to
a parser error since the file can be played if all tracks are defined somewhere. Es-
pecially pure chapter files might have an empty T RACKS element if the muxer
doesn’t catch the case that no tracks are present and consequently creates an
empty T RACKS element.
An example of a T RACK E NTRY element can be found on (→ page 26)

Table 6: The T RACKS element, child of S EGMENT (→2)


Element Description
Master, # ≥ 1 One T RACK E NTRY element describes one track of the S EG -
T RACK E NTRY (→7) MENT
ID: AE
Index →page 2 end of T RACKS

Table 7: The T RACK E NTRY element, child of T RACKS (→6)


Element Description
uint, # = 1 (!) defines an identification number of the track. This number
T RACK N UMBER cannot be equal to 0. This number is used by the B LOCK and
ID: D7 S IMPLE B LOCK structures.
uint, # = 1 is a unique identificator of the track within the file. It cannot
T RACKUID be equal to 0
ID: 73 C5
T RACK E NTRY continued on next page

18
M ATROSKA file format

Element Description
uint, # = 1 (!) defines the type of a track, i.e. video, audio, subtitle etc.
T RACK T YPE (→13)
ID: 83
bool, # ≤ 1 When F LAG E NABLED is 1, track is used
F LAG E NABLED
ID: B9
def: 1
bool, # ≤ 1 When F LAG D EFAULT is 1, the track should be selected by
F LAG D EFAULT the player by default. Obviously, if no video track and/or no
ID: 88 audio track has a default flag, one video track and one audio
def: 1 track should be chosen by the player, whereas no subtitle
should be enabled if no subtitle has a default flag.
bool, # ≤ 1 When F LAG F ORCED is 1, the track must be played. When
F LAG F ORCED several subtitle tracks are forced, the one matching the au-
ID: 55 AA dio language should be chose. An example would be a sub-
def: 0 title track that cannot be disabled, like the one you find on
the german DVD “Eiskalte Engel” when you select english
audio. Since this flag can only be used to apply a restriction
on digital content, it must be qualified as Digital Restrictions
Management.
bool, # ≤ 1 When F LAG L ACING is 1, the track may contain laced blocks.
F LAG L ACING A parser that supports all types of lacing (→ section 6.2) can
ID: 9C safely ignore this flag.
def: 0
uint, # ≤ 1 indicates the number of frames a player must be able to
M IN C ACHE cache during playback. This is for instance interesting if a
ID: 6D E7 native MPEG4 file with frames in coding order is played.
def: 0
uint, # ≤ 1 indicates the maximum cache size a player needs to cache
M AX C ACHE frames. A value of NULL means that no cache is required.
ID: 6D F8
T RACK E NTRY continued on next page

19
M ATROSKA file format

Element Description
uint, # ≤ 1 This value indicates the number of nanoseconds a frame
D EFAULT D URATION lasts. This value is applied if no $D URATION value is in-
ID: 23 E3 83 dicated for a frame or if lacing (→ section 6.1) is used. A
value of 0 means that the duration of frames of the track
is not necessarily constant (e.g. variable framerate video,
or Vorbis audio). D EFAULT D URATION should be written for
each track with a constant frame rate since it makes seeking
easier.
float, # ≤ 1 Every timecode of a block (cluster timecode + block
T RACK T IMECODE S CALE timecode) is multiplied by this value to obtain the real time-
ID: 23 31 4F code of a block.
utf-8, # ≤ 1 A N AME element contains a human-readable name for the
N AME track. Note that you can’t define which language this track
ID: 53 6E name is in. You have to use Tags (→ section 5.9)) if you
want to use several titles in different languages for the same
track.
string, # ≤ 1 specifies the language of a track, using ISO639-23 . This
L ANGUAGE is NOT necessarily the language of $N AME, for example a
ID: 22 B5 9C german AC3 track could be called “German - AC3 5.1” or
def: eng “Deutsch - AC3 5.1” or “Allemand AC3 5.1” etc.
string, # = 1 (!) The C ODEC ID specifies the Codec4 which is used to decode
C ODEC ID the track.
ID: 86
binary, # ≤ 1 C ODEC P RIVATE contains information the codec needs before
C ODEC P RIVATE decoding can be started. An example is the Vorbis initializa-
ID: 63 A2 tion packets for Vorbis audio.
utf-8, # ≤ 1 C ODEC N AME is a human-readable name of the Codec
C ODEC N AME
ID: 25 86 88
uint, # ≥ 0 An ATTACHMENT L INK contains the UID of an attachment
ATTACHMENT L INK that is used by this track.
ID: 74 46
T RACK E NTRY continued on next page

3
http://lcweb.loc.gov/standards/iso639-2/englangn.html
4
http://matroska.org/technical/specs/codecid/index.html

20
M ATROSKA file format

Element Description
Master, # ≤ 1 V IDEO contains information that is specific for video tracks
V IDEO (→8)
ID: E0
Master, # ≤ 1 AUDIO contains information that is specific for audio tracks
AUDIO (→9)
ID: E1
Master, # ≤ 1 C ONTENT E NCODINGS contains information about (lossless)
C ONTENT E NCODINGS compression or encryption of the track
(→10)
ID: 6D 80
Index →page 2 end of T RACK E NTRY

Obviously, the V IDEO element must be present for video tracks, whereas the AUDIO
element must be present for audio tracks. Although it doesn’t make sense to have
both elements in one T RACK E NTRY element, it wouldn’t make a file unplayable.

Table 8: The V IDEO element, child of T RACK E NTRY (→7)


Element Description
uint, # = 1 Width of the encoded video track in pixels
P IXELW IDTH
ID: B0
uint, # ≤ 1 Height of the encoded video in pixels
P IXEL H EIGHT
ID: BA
uint, # ≤ 1 Number of Pixels to be cropped from the bottom
P IXEL C ROP B OTTOM
ID: 54 AA
def: 0
uint, # ≤ 1 Number of Pixels to be cropped from the top
P IXEL C ROP T OP
ID: 54 BB
def: 0
V IDEO continued on next page

21
M ATROSKA file format

Element Description
uint, # ≤ 1 Number of Pixels to be cropped from the left
P IXEL C ROP L EFT
ID: 54 CC
def: 0
uint, # ≤ 1 Number of Pixels to be cropped from the right
P IXEL C ROP R IGHT
ID: 54 DD
def: 0
uint, # ≤ 1 Width of the video during playback
D ISPLAY W IDTH
ID: 54 B0
def: $P IXELW IDTH
uint, # ≤ 1 Height of the video during playback
D ISPLAY H EIGHT
ID: 54 BA
def: $P IXEL H EIGHT
uint, # ≤ 1 Unit $D ISPLAY W IDTH and $D ISPLAY H EIGHT is measured
D ISPLAY U NIT in. This can be 0→pixels, 1→centimeters, 2→inches
ID: 54 B2
def: 0
Index →page 2 end of V IDEO

$P IXEL C ROPXXXX is applied on $P IXELXXX, so the output is cropped after decod-


ing, but before stretching it to the dimensions indicated with $D ISPLAYXXXX.

Table 9: The AUDIO element, child of T RACK E NTRY (→7)


Element Description
uint, # ≤ 1 Indicates the sample rate the track is encoded at in Hz
S AMPLING F REQUENCY
ID: B5
def: 8 kHz
AUDIO continued on next page

22
M ATROSKA file format

Element Description
uint, # ≤ 1 Indicates the sample rate the track must be played at in
O UTPUT- Hz. The default value of this element is equal to $S AM -
S AMPLING F REQUENCY PLING F REQUENCY .
ID: 78 B5
uint, # ≤ 1 Number of channels of the audio track
C HANNELS
ID: 9F
def: 1
uint, # ≤ 1 Bits per sample, this is usually used with PCM-Audio.
B IT D EPTH
ID: 62 64
Index →page 2 end of AUDIO

Table 10: The C ONTENT E NCODINGS element, child of T RACK E NTRY (→7)
Element Description
Master, # ≥ 1 A C ONTENT E NCODING-element describes one compression
C ONTENT E NCODING or encryption that has been used on this track.
(→11)
ID: 62 40
Index →page 2 end of C ONTENT E NCODINGS

Table 11: The C ONTENT E NCODING element, child of


C ONTENT E NCODINGS (→10)
Element Description
uint, # ≤ 1 Tells when to decode according to this pattern. The de-
C ONTENT E NCODING - coder starts with the C ONTENT E NCODING that has the high-
O RDER est C ONTENT E NCODING O RDER.
ID: 50 31
def: 0
C ONTENT E NCODING continued on next page

23
M ATROSKA file format

Element Description
uint, # ≤ 1 Defines which parts of the track are compressed or en-
C ONTENT E NCODING - crypted this way
S COPE (→14)
ID: 50 32
def: 1
uint, # ≤ 1 Describes which type of encoding is described. 0 → com-
C ONTENT E NCODING - pression, 1 → encryption
T YPE
ID: 50 33
def: 0
Master, # ≤ 1 If C ONTENT E NCODING T YPE =0, this element describes how
C ONTENT C OMPRESSION it is compressed
(→12)
ID: 50 34
Master, # ≤ 1 If C ONTENT E NCRYPTION =1, this element describes how it
C ONTENT E NCRYPTION is encrypted
(→??)
ID: 50 35
Index →page 2 end of C ONTENT E NCODING

The C ONTENT E NCODING element allows to apply not only encryption, but also
lossless compression to a track. This can be used to compress text subtitles, but
also to remove sync headers from audio packets. For example, each AC3 frame
starts with 0B 77, and there is no real point in saving those two bytes for each
frame in a M ATROSKA file. For a simple AC3 file, this does make sense because
there it can be used to find a new frame start if data is damaged.

Table 12: The C ONTENT C OMPRESSION element, child of


C ONTENT E NCODING (→11)
Element Description
uint, # ≤ 1 The C ONTENT C OMPA LGO element says which algorithm was
C ONTENT C OMPA LGO used for this compression.
(→15)
ID: 42 54
def: 0
C ONTENT C OMPRESSION continued on next page

24
M ATROSKA file format

Element Description
binary, # ≤ 1 Contains settings that are required for decompression.
C ONTENT C OMP S ETTINGS These settings are specific for each compression algorithm.
ID: 42 55 For example, it contains the striped header bytes when
$C ONTENT C OMPA LGO=3 (→ page 26).
Index →page 2 end of C ONTENT C OMPRESSION

Table 13: Values of T RACK T YPE, child of T RACK E NTRY (→7)


Value Description
0x01 track is a video track
0x02 track is an audio track
track is a complex track, i.e. a combined video and
0x03
audio track
0x10 track is a logo track
0x11 track is a subtitle track
0x12 track is a button track
0x20 track is a control track
end of T RACK T YPE

Table 14: Bits in C ONTENT E NCODING S COPE, child of


C ONTENT E NCODING (→11)
Value Description
1 all frames
2 the track’s C ODEC P RIVATE
the C ONTENT C OMPRESSION in the next C ONTENT E N -
4
CODING (next as in next in decoding order)

end of C ONTENT E NCODING S COPE

25
M ATROSKA file format

Here is one example of a possible T RACK E NTRY element: A DTS-audio track that
is using header striping. The C ONTENT C OMP S ETTINGS element contains the four
bytes each DTS frame starts with.

26
M ATROSKA file format

Table 15: Values of C ONTENT C OMPA LGO, child of


C ONTENT C OMPRESSION (→12)
Value Description
0 zlib
1 bzlib
2 lzo1x
3 header striping
end of C ONTENT C OMPA LGO

27
M ATROSKA file format

5.5 Cluster

A C LUSTER contains multimedia data and usually spans over a range of a few
seconds. The following picture shows a typical cluster:

Although sticking to this order of the elements is not mandatory, it is recom-


mended not to have any non-B LOCK G ROUP /S IMPLE B LOCK after the first B LOCK-
G ROUP /S IMPLE B LOCK, because it’s bad if the entire cluster must be read before it
can be used just because the timecode is stored at the end.

Table 16: The C LUSTER element, child of S EGMENT (→2)


Element Description
uint, # ≤ 1 The Cluster timecode is the timecode all block timecodes are
T IME C ODE indicated relatively to.
ID: E7
def: 0
uint, # ≤ 1 The P OSITION element indicates the position of the begin-
P OSITION ning of its parent element inside its grand parent element.
ID: A7 This can help to resync in case of damaged data, but is of no
use if no data is damaged.
C LUSTER continued on next page

28
M ATROSKA file format

Element Description
uint, # ≤ 1 Indicates the size of the preceding cluster in bytes. This
P REV S IZE helps to seek backwards, and to find the preceding cluster,
ID: AB without having to look at M ETASEEK or C UE data. This is
also helpful to resync, e.g. if the EBML-ID of the preceding
C LUSTER is damaged.
Master, # ≥ 0 Contains a B LOCK along with some attached information
B LOCK G ROUP (→17) like references
ID: A0
binary, # ≥ 0 This is a B LOCK (→ page 42) without additional attached in-
S IMPLE B LOCK formation. Since a S IMPLE B LOCK does not require a B LOCK-
ID: A3 G ROUP around it, it causes less overhead. S IMPLE B LOCK is
M ATROSKA v2.
Index →page 2 end of C LUSTER

Table 17: The B LOCK G ROUP element, child of C LUSTER (→16)


Element Description
binary, # = 1 (!) contains data to be replayed. See page 42 for details.
B LOCK
ID: A1
int, # ≥ 0 Timecode of a frame, relative to the B LOCK’s timecode, of
R EFERENCE B LOCK a frame that needs to be decoded before this B LOCK can be
ID: FB decoded.
int, # ≤ 1 Indicates the scaled duration of the BLOCK. If this
B LOCK D URATION value is not written, it is assumed to be (1) the differ-
ID: 9B ence <timecode of next block of the same stream> -
<timecode> (2) equal to D EFAULT D URATION (for the last
block of each stream).
As a consequence, the D URATION element is mandatory for
every BLOCK of subtitle tracks, unless a subtitle is indeed
supposed to disappear only directly before the next one ap-
pears. But even then it is recommended to write D URATION.
Index →page 2 end of B LOCK G ROUP

29
M ATROSKA file format

5.6 Cues

The C UEs element contains information helpful (but not necessary) for seeking.
Each piece of information, called a C UE P OINT, contains a timestamp, and a list of
pairs (track number, (cluster position[, block number within cluster])). Generally,
a C UE P OINT should only point to keyframes.

30
M ATROSKA file format

Table 18: The C UES element, child of S EGMENT (→2)


Element Description
Master, # ≥ 1 One C UE P OINT contains one entry point (or a list of entry
C UE P OINT (→19) points with one point for one track) for one timecode.
ID: BB
Index →page 2 end of C UES

Table 19: The C UE P OINT element, child of C UES (→18)


Element Description
uint, # = 1 (!) The timecode of the C LUSTERs or B LOCKs that are referred
C UE T IME to by this C UE P OINT
ID: B3
Master, # ≥ 1 A position where a C LUSTER or B LOCK can be found with
C UE T RACK P OSITIONS the timecode $C UE T IME.
(→20)
ID: B7
Index →page 2 end of C UE P OINT

Table 20: The C UE T RACK P OSITIONS element, child of C UE P OINT (→19)


Element Description
uint, # ≥ 1 (!) Track for which a position is given. This track number is the
C UE T RACK same as T RACK E NTRY (→ Table 7)::T RACK N UMBER.
ID: F7
uint, # ≥ 1 (!) The position of the cluster the referred block is found in.
C UE C LUSTER P OSITION This position is relative to the S EGMENT’s (→ Table 2) data
ID: F1 section.
uint, # ≤ 1 The block with timecode $C UE T IME is the $C UE B LOCK-
C UE B LOCK N UMBER N UMBER-th B LOCK /S IMPLE B LOCK inside the C LUSTER at
ID: 53 78 position $C UE C LUSTER P OSITION.
Index →page 2 end of C UE T RACK P OSITIONS

31
M ATROSKA file format

5.7 Chapters - Editions and ChapterAtoms

The C HAPTERS element contains a list of all editions and chapters found in this
S EGMENT. Chapters in M ATROSKA files are more powerful than chapters on DVDs,
their handling is, however, way more complex.

Table 21: The C HAPTERS element, child of S EGMENT (→2)


Element Description
Master, # ≥ 1 One E DITION E NTRY describes one Edition. Just like with
E DITION E NTRY (→22) T RACK E NTRY (→ Table 7), theoretically you could spread
ID: 45 B9 information about one Edition over different E DITION E N -
TRY s and use $E DITION UID to find out which edition the
E DITION E NTRY is referring to, but it’s highly discouraged.
Index →page 2 end of C HAPTERS

An edition contains one set of chapter definitions, so having several editions means
having several sets of chapter definitions. This case is used when using this as
a playlist - playing one chapter after the other while having gaps between the
chapters.

Table 22: The E DITION E NTRY element, child of C HAPTERS (→21)


Element Description
uint, # ≤ 1 $E DITION UID is the UID of the edition. This element is
E DITION UID mandatory if you want to apply one or more titles to an
ID: 45 BC edition
bool, # ≤ 1 When $E DITION F LAG H IDDEN is 1, this edition should not
E DITION F LAG H IDDEN be available via the user interface
ID: 45 BD
def: 0
bool, # ≤ 1 When $E DITION F LAG D EFAULT is 1, this edition should be
E DITION F LAG D EFAULT selected by the player as default
ID: 45 DB
def: 0
E DITION E NTRY continued on next page

32
M ATROSKA file format

Element Description
bool, # ≤ 1 When $E DITION F LAG O RDERED is 1, this edition contains a
E DITION F LAG O RDERED playlist. When $E DITION F LAG O RDERED is 0, it contains a
ID: 45 DD simple DVD like chapter definition.
def: 0
Master, # ≥ 1 One C HAPTER ATOM contains the definition of one chapter.
C HAPTER ATOM (→23) This element is the only one in M ATROSKA files that can con-
ID: B6 tain itself recursively - in this case to define subchapters.
Index →page 2 end of E DITION E NTRY

The following picture shows an ordered edition:

33
M ATROSKA file format

Table 23: The C HAPTER ATOM element, child of E DITION E NTRY (→22),
child of C HAPTER ATOM (→23)
Element Description
uint, # = 1 The UID of this chapter. It must be unique within the file.
C HAPTER UID
ID: 73 C4
uint, # ≤ 1 The unscaled timecode the chapter starts at. As the value
C HAPTERT IME S TART is unsigned, a chapter cannot start earlier than at timecode
ID: 91 0, even whereas timecodes up to -30.000 are possible for
def: 0 multimedia data.
uint, # ≤ 1 The unscaled timecode the chapter ends at. The default
C HAPTERT IME E ND value is the start of the next chapter or the end of the parent
ID: 92 chapter or the end of the segment, whatever exists, in that
order.
bool, # ≤ 1 When $C HAPTER F LAG H IDDEN is 1, the chapter should not
C HAPTER F LAG H IDDEN be visible in the user interface, but should be played back
ID: 98 normally.
def: 0
bool, # ≤ 1 When $C HAPTER F LAG E NABLED is 0, the chapter should be
C HAPTER F LAG E NABLED skipped by the player
ID: 45 98
def: 1
char[16], # ≤ 1 This element can only occur if $E DITION F LAG O RDERED=1.
C HAPTER S EGMENT UID The S EGMENT of which the UID is $C HAPTER S EGMENT UID
ID: 6E 67 is used instead of the current S EGMENT. Obviously, this
S EGMENT should be easy to find, like when it is the first
segment of a file in the same directory.
uint, # ≤ 1 The edition to use inside the S EGMENT selected via C HAP-
C HAPTER S EGMENT TER S EGMENT UID. The timecodes $C HAPTERT IME S TART
-E DITION UID and $C HAPTERT IME E ND refer to playback timecodes of that
ID: 6E BC edition, i.e. the timecodes are relative to that playlist. This
is called “nested Editions” and is NOT SUPPORTED by Haali
Media Splitter.
C HAPTER ATOM continued on next page

34
M ATROSKA file format

Element Description
Master, # ≤ 1 Contains a list of tracks the chapter applies to.
C HAPTERT RACKS
(→24)
ID: 8F
Master, # ≥ 0 Contains all chapter titles
C HAPTER D ISPLAY
(→25)
ID: 80
Index →page 2 end of C HAPTER ATOM

A useful application for the C HAPTER F LAG H IDDEN element in connection with
ordered editions is the following: You have a couple of episodes of a series, but
want to save space by only saving the intro and outtro once. You create one playlist
(ordered edition) per episode, and another playlist playing all episodes in a row.
Whereas in the first case you might want to play intro and outtro for each episode,
you might not want to do that in the second case.
If you don’t want to make the three parts intro - movie - outtro selectable via the
user interface when playing single episodes, you call the intro-chapter “Episode
- blah” and hide the movie- and the outtro chapter using $C HAPTER F LAG H ID -
DEN =1. Then, the playlist playing all episodes would be intro - episode 1 - episode
2 - ... - last episode - outtro, whereas the other playlists would be intro - episode N
- outtro. The name of the intro chapter would be set to “Episode n”.

Table 24: The C HAPTERT RACKS element, child of C HAPTER ATOM (→23)
Element Description
uint, # ≥ 1 One number of a track a chapter is used with.
C HAPTERT RACK N UMBER
ID: 89
Index →page 2 end of C HAPTERT RACKS

35
M ATROSKA file format

Table 25: The C HAPTER D ISPLAY element, child of C HAPTER ATOM (→23)
Element Description
utf-8, # ≤ 1 A title of a chapter
C HAP S TRING
ID: 85
string, # ≥ 0 The language of $C HAP S TRING as defined in ISO639-25
C HAP L ANGUAGE
ID: 43 7C
def: eng
utf-8, # ≥ 0 A country the title is used in. For example, a german title in
C HAP C OUNTRY Germany might be different than the title used in Austria.
ID: 43 7E
Index →page 2 end of C HAPTER D ISPLAY

5
http://lcweb.loc.gov/standards/iso639-2/englangn.html#two

36
M ATROSKA file format

5.8 Attachments

Theoretically, any file type can be attached to a M ATROSKA file, however, this
possibility is usually used to attach pictures like CD covers or fonts required to
display a subtitle track correctly. Obviously, attaching executable files would allow
for M ATROSKA files to contain viruses - a scenario that is not exactly the indended
application of attachments or anything else M ATROSKA is capable of.

Table 26: The ATTACHMENTS element, child of S EGMENT (→2)


Element Description
Master, # ≥ 1 Describes and contains one attached file
ATTACHED F ILE (→27)
ID: 61 A7
Index →page 2 end of ATTACHMENTS

Table 27: The ATTACHED F ILE element, child of ATTACHMENTS (→26)


Element Description
utf8, # ≤ 1 A human-readable description of the file
F ILE D ESCRIPTION
ID: 46 7E
utf8, # ≤ 1 The name that should be proposed by a demuxer when ex-
F ILE N AME tracting the file
ID: 46 6E
string, # ≤ 1 MIME type of the file, like ...
F ILE M IME T YPE
ID: 46 60
binary, # ≤ 1 The file itself
F ILE D ATA
ID: 46 5C
uint, # = 1 The UID of that file, just like T RACKUID, C HAPTER UID etc.
F ILE UID The UID is required if a T RACK E NTRY (→ Table 7) wants to
ID: 46 AE refer to this Attachment.
Index →page 2 end of ATTACHED F ILE

37
M ATROSKA file format

5.9 Tags

Table 28: The TAGS element, child of S EGMENT (→2)


Element Description
Master, # ≥ 1 One TAG element describes one Tag
TAG (→29)
ID: 73 73
Index →page 2 end of TAGS

TAGS provide additional information6 not important for replay. A TAGS element
contains a number of TAG elements. Each TAG element contains a list of UIDs (usu-
ally T RACKUIDs or E DITION UIDs), and a list of S IMPLE TAGs, each one containing
a name and a value:
6
http://www.matroska.org/technical/specs/tagging/index.html

38
M ATROSKA file format

If no TARGETs are specified, then the TAG is a global TAG refering to the entire
S EGMENT. Of course, two different TAG elements can contain identical TARGETS.

Table 29: The TAG element, child of TAGS (→28)


Element Description
Master, # ≤ 1 Describes which elements a Tag applies to
TARGETS (→30)
ID: 63 C0
TAG continued on next page

39
M ATROSKA file format

Element Description
Master, # ≥ 1 Each S IMPLE TAG contains one tag that applies to each target
S IMPLE TAG (→31) in TARGETS
ID: 67 C8
Index →page 2 end of TAG

Note that there is nothing like a TAG UID.

Table 30: The TARGETS element, child of TAG (→29)


Element Description
uint, # ≤ 1 This number describes the logical level of the object the Tag
TARGET T YPE VALUE refers to
(→??)
ID: 68 CA
def: 50
utf-8, # ≤ 1 A string describing the logical level of the object the Tag is
TARGET T YPE refering to
ID: 63 CA
uint, # ≥ 0 The UID of a track the tag is referring to
T RACKUID
ID: 63 C5
uint, # ≥ 0 The UID of an edition the tag is referring to. Note that this
E DITION UID is the only way to apply titles to an edition
ID: 63 C9
uint, # ≥ 0 The UID of a chapter the tag is referring to
C HAPTER UID
ID: 63 C4
uint, # ≥ 0 The UID of an attachment the tag is referring to
ATTACHMENT UID
ID: 63 C6
Index →page 2 end of TARGETS

40
M ATROSKA file format

Table 31: The S IMPLE TAG element, child of TAG (→29)


Element Description
utf-8, # ≥ 1 (!) Name of the tag.
TAG N AME
ID: 45 A3
string, # ≤ 1 $TAG L ANGUAGE is the language of $TAG N AME. Note that
TAG L ANGUAGE the default here is ‘und’, whereas the default track / chapter
ID: 44 7A title language is ‘eng’.
def: und
bool, # ≤ 1 When 1, this title and language is the original title given to
TAG O RIGINAL the item
ID: 44 84
def: 1
utf-8, # ≤ 1 The value of the tag when it is a string
TAG S TRING
ID: 44 87
binary, # ≤ 1 The ‘value’ of the tag when it’s a binary tag
TAG B INARY
ID: 44 85
Index →page 2 end of S IMPLE TAG

5.9.1 A few common Tags

• TITLE, Target: EditionUID: used to define names for Editions. This is exactly
what you can see in the screenshot above.

• BPS, Target: TrackUID: used to define the bitrate of a track

• FPS, Target: TrackUID: used to define the framerate of a track

41
M ATROSKA file format

6 M ATROSKA block Layout and Lacing

6.1 Basic layout of a Block

A M ATROSKA block has the following format:


BLOCK {
v i n t TrackNumber
s i n t 1 6 Timecode // r e l a t i v e to C l u s t e r timecode
i n t 8 Fl ags // l a c i n g , keyframe , d i s c a r d a b l e
i f ( lacing ) {
i n t 8 frame_count−1
i f ( l a c i n g == EBML l a c i n g ) {
v i n t s i z e [ 0]
s v i n t s i z e [ 1 . . frame_count −2]
} else
i f ( l a c i n g == Xiph l a c i n g ) {
i n t 8 s i z e [ s i z e of <leading ( frame_count −1) frames> / 255 + 1]
}
}
i n t 8 [ ] data
}

The following bits are defined for F LAGS:

Bit 0x80: keyframe:


No frame after this frame can reference any frame before
this frame and vice versa (in AVC-words: this frame is an
IDR frame). The frame itself doesn't reference any other
frames.
Bits 0x06: lace type
00 - no lacing
01 - Xiph lacing
11 - EBML lacing
10 - fixed-size lacing
Bit 0x08 : invisible: duration of this block is 0
Bit 0x01 : discardable: this frame can be discarded if the decoder
is slow

42
M ATROSKA file format

The following flags are only defined for Matroska v2 and can thus only be used
in a S IMPLE B LOCK: keyframe, invisible, discardable. The type of lacing in use
defines how the SIZE values are to be read.

6.2 Lacing

Lacing is a technique that allows to store more than one atom of data (like one
audio frame) in one block, with the goal to decrease overhead, without losing the
ability to separate the frames in a lace later again.

Generally, the size of the last frame in a Lace is not stored, as it can be derived
from the total block size, the size of the block header and the sum of the sizes of
all other frames.
Frame duration values are not preserved! That means, it is highly recommended
not to use lacing if the frame duration is not constant, like Vorbis audio.

6.2.1 Xiph Lacing

The size of each frame is coded as a sum of int8. A value smaller than 255 indi-
cates that the next value refers to the next frame.
Example
size = { 187, 255, 255, 120, 255, 0, 60 } means that there are 4 frames
with 187, 630, 255, 60 bytes.

6.2.2 EBML Lacing

Size of first frame (”frame 0“) of a lace = size[0]


Size of frame i of a lace: size[i] - size[i-1]

6.2.3 Fixed Lacing

Fixed Lacing is used if all frames in a lace have the same size. Examples are AC3
or DTS audio. In this case, knowing the number of frames is enough to calculate
the size of one frame. Consequently, there are no size values.

43
M ATROSKA file format

7 Overhead of M ATROSKA files

The scope of this section is explaining how to predict the overhead of a M ATROSKA
file before muxing, and without analysing any of the source files excessively. This
section assumes that B LOCK G ROUPS and B LOCKS are used, and that no S IMPLE -
B LOCKS are used. If you want to estimate overhead of files that use S IMPLE -
B LOCKS, you get about the same overhead as with B LOCKS without B LOCK D URA -
TION , R EFERENCE B LOCK or B LOCK G ROUP .

7.1 Overhead of B LOCK G ROUPS

First, here again the layout of a typical B LOCK G ROUP

BlockGroup <size>
Block <size> <number, flag, timecode>
[ Reference <size> <val> ]

The EBML identication for B LOCKs and B LOCK G ROUPs are 1 byte each, so that the
structure above, not counting R EFERENCEs, takes:

• BlockGroup < 128 bytes: 8 bytes

• BlockGroup < 16kbytes: 10 bytes

• BlockGroup < 2MBytes: 12 bytes

B LOCK G ROUPs larger than 2MBytes are extremely unlike, and even B LOCK G ROUPs
larger than 16kBytes won’t occur often, compared to B LOCK G ROUPs between 128
bytes and 16 kBytes. That means, assuming an overhead of 10 bytes for B LOCK-
G ROUPs without R EFERENCES usually results in a good approximation.

7.1.1 video

In a typical video stream, there are a lot of frames with 1 R EFERENCE (P-Frames,
Delta-Frames), and a few keyframes. Typical rations are 100:1. There might also
be frames with 2 R EFERENCES (B-Frames), e.g. native MPEG4 streams. Assuming
a ratio of 66:33:1 for B:P:K, and assuming a bitrate far below 3,2 MBit/s (meaning
that typical B- and P-frames are smaller than 16 kB), that causes about 15 bytes of

44
M ATROSKA file format

overhead per frame. If there are no B-Frames, there are about 13 bytes per frame.

Example: 2 hours, 25 fps.


The video stream will cause around 2,3 MB of overhead.

7.1.2 audio - without lacing

As audio does usually not have any R EFERENCEs (all audio frames are keyframes),
one audio frame will take 8 or 10 bytes of overhead. For MP3, AC3, DTS and AAC,
frames causing 8 bytes of overhead are unlikely. They are more likely for Vorbis.

Example: MP3 audio, 24ms per frame, duration: 2h


This stream will cause 3MB of overhead.

7.1.3 audio - with lacing

1. CBR+CFR: fixed lacing


In this case, fixed lacing (see section 6.2.3) is used. With fixed lacing, the overhead
is the normal B LOCK G ROUP overhead, plus 1 byte for the lace header. Assuming
that B LOCK G ROUPS are not larger than 16k, that means that the overhead per
frame is equal to 11 / frame_count

Example: AC3 audio, 448 kbps, 1792 bytes per frame, 32ms per frame
1.) 8 frames per lace.
overhead for one frame = 11/8 = 1,375 bytes = 1 byte / 23,3 ms.
2.) 9 frames per lace.
overhead for one frame = 11/9 = 1,222 bytes = 1 byte / 26,2 ms.
3.) 10 frames per lace.
overhead for one frame = 13/10 = 1,3 bytes = 1 byte / 24,6 ms.

An AC3 stream of 2 hours with 9 frames per lace will cause 270kB of overhead.

2. no CBR, but almost all frames smaller than 255 bytes: XIPH lacing
In this case, XIPH lacing (see section 6.2.1) is used, meaning that the overhead of
a B LOCK G ROUP is equal to normal BlockGroup overhead + frame_count, mean-
ing that the overhead per frame is about (11+frame_count)/frame_count, if there
are frame_count frames in each lace. Again, if the B LOCK G ROUPs are larger than
16kBytes, then the overhead is (13+frame_count)/frame_count.
In other words, the ratio in bytes / frame will always be between about 1,2 and

45
M ATROSKA file format

2,5 for audio streams with mainly small frames.


Although XIPH lacing is also defined for larger frames, EBML lacing is usually
more effective then.

3. otherwise: EBML lacing Assuming that the difference in size between 2 con-
secutive frames is smaller than 8191, 1 or 2 bytes are needed to code the size of
each frame, additionally to the normal B LOCK G ROUP overhead.

As a result, we get 3 possible estimations:


a) worst case That means, a lace with frame_count frames using EBML lacing will
cause not more than ((11 or 13)+2*frame_count)/frame_count bytes of over-
head per frame.
Example 1: 16 frames per lace, B LOCK G ROUP > 16kB, worst case:
overhead <= (13 + 2*16)/16 = 2,8 bytes / frame.
Example 2: 8 frames per lace, B LOCK G ROUP < 16kB, worst case:
overhead <= (11 + 2*8)/8 = 3,4 bytes / frame.
b) best case The best case is obviously that 2 consecutive frames differ by not
more than 62 bytes. In that case, one byte is needed to code the size of one frame.
However, the first frame might need to bytes, if it is larger than 126 bytes.
Example 1: 16 frames per lace, B LOCK G ROUP > 16kB, best case:
overhead <= (13 + 1*16)/16 = 1,8 bytes / frame.
Example 2: 8 frames per lace, B LOCK G ROUP < 16kB, best case:
overhead <= (11 + 1*8)/8 = 2,4 bytes / frame.
c) average case This is the case you need for optimal overhead prediction. Un-
fortunately, the average case depends on the compression format of the corre-
sponding audio track, its bitrate, maybe even the encoder that has been used. The
easiest way to gather data on the average case of EBML lace header overhead is
to simulate the lace results of different files that are likely to be used. Candidates
are MPEG 1/2/4 audio and Vorbis, but not AC3 or DTS.
I have run a simulation with the following file types:
MPEG 1 Layer 3 (128 and 192 kbps, 48 kHz), HE-AAC (224 kbps and 96 kbps,
44,1 kHz), LC-AAC (268 kbps, 44,1 kHz)
The results obtained from those files are discussed on the following pages. The
lace behaviour simulation has been run using mls7 ( short for ’matroska lace sim-
7
http://www-user.tu-chemnitz.de/~noe/Video-Zeug/mls/

46
M ATROSKA file format

ulator’). Note that it would be required to run the simulation and to evaluate the
results as follows for each audio format, in each bitrate, maybe even with each
encoder, for which results as accurate as possible shall be predicted.
The results for the lace header size are as follows:
Lace header overhead per frame @ <x> Frames per lace
Audio Format 4 8 12 16 24 32 48 64 96
MP3 @ 128 kbps 1,39 1,29 1,26 1,24 1,22 1,22 1,21 1,20 1,20
MP3 @ 192 kbps 1,50 1,41 1,38 1,37 1,36 1,35 1,34 1,34 1,33
HE-AAC @ 224 kbps 1,39 1,29 1,25 1,24 1,22 1,21 1,20 1,20 1,20
HE-AAC @ 64 kbps 1,34 1,23 1,19 1,18 1,16 1,15 1,14 1,14 1,13
LC-AAC @ 268 kbps 1,31 1,19 1,16 1,14 1,12 1,11 1,10 1,09 1,09
Applications using libmatroska for M ATROSKA file creation are using 8 frames
per lace. As a consequence, the overhead for a track using EBML lacing can be
predicted to an acceptable accuracy if the audio format is known.
As you can also see, larger laces hardly affect the overhead caused by the lace
headers of B LOCKs from a certain size on.
However, larger laces mean fewer B LOCKs and thus fewer B LOCK G ROUPs, so the
total overhead per frame, including the overhead caused by overhead outside of
the B LOCKS, is worth a look. Here are the results with the same test files as above
Overhead per frame @ <x> Frames per lace
Audio Format 4 8 12 16 24 32 48 64 96
MP3 @ 128 kbps 4,14 2,67 2,17 1,93 1,68 1,56 1,48 1,41 1,33
MP3 @ 192 kbps 4,25 2,79 2,30 2,06 1,81 1,75 1,61 1,54 1,47
HE-AAC @ 224 kbps 4,14 2,66 2,23 2,05 1,76 1,62 1,48 1,40 1,33
HE-AAC @ 64 kbps 4,09 2,61 2,11 1,86 1,62 1,49 1,40 1,34 1,27
LC-AAC @ 268 kbps 4,06 2,57 2,07 1,82 1,66 1,51 1,37 1,30 1,22
Now lets take the 2nd table and find out how much overhead that means in a real
movie of 2 hours.
In the case of the mp3 files used in that example, one frame lasts 24ms. In the
case of our LC-AAC file, one frame lasts 23,22 ms, and for the HE-AAC file we get
46,44ms.
Thus a file of 2 hours will have the following number of frames:
MP3 - 300,000
LC-AAC - 310,000
HE-AAC - 155,000.

47
M ATROSKA file format

First, lets use the default setting of libmatroska (8 frames per lace) and calculate
the overhead a muxing app using libmatroska would cause when muxing those
files into a movie:

• MP3 @ 128: overhead = 300,000 * 2,67 = 801,000 bytes

• MP3 @ 192: overhead = 300,000 * 2,79 = 837,000 bytes

• HE-AAC @ 224: overhead = 155,000 * 2,66 = 412,300 bytes

• LC-AAC @ 268: overhead = 310,000 * 2,57 = 796,700 bytes

With 24 frames per lace, an MP3 block would have a duration of 576ms, an HE-
AAC block even about 1 second. That means, when seeking in a file, an awkward
impression of the audio being missing for a moment could occur. Thus, larger laces
than 1 second are highly discouraged. Nevertheless, let’s analyze the overhead in
our file for laces of 24 and 96 frames each, and compare the overhead to the one
caused by libmatroska. Here is the corresponding table:
Frames per lace
Audio Format 8 24 96
MP3 @ 128 kbps 782kB 492kB 389kB
MP3 @ 192 kbps 817kB 530kB 430kB
HE-AAC @ 224 kbps 402kB 266kB 201kB
HE-AAC @ 64 kbps 395kB 245kB 192kB
LC-AAC @ 268 kbps 778kB 502kB 369kB
As you can see, putting 24 frames in one block, compared to 8 frames, saves
some overhead. However, putting 96 frames in one B LOCK instead of 24 saves
less overhead than 24 compared to 8. As 96 frames per lace will usually cause
uncomfortable seeking, it is recommended not to put more than about 24 frames
in one B LOCK.

48
M ATROSKA file format

7.2 Overhead of C LUSTERs

Although most of the overhead is caused by B LOCK G ROUPs, the amount of over-
head caused by C LUSTERS themselves is noticeable as well.
Here again the basic layout of a C LUSTER:

Cluster <size>
[ CRC32 ]
TimeCode <size> <timecode>
[ PrevClusterSize <size> <prevsize> ]
[ Position <size> <position> ]
{ BlockGroup }

First, some conventions:

• each C LUSTER has a size between 16kB and 2MB

• each C LUSTER may begin between 16MB and 4GB

As typical movie files are designed to fit on 1 or 2 CDs, or 2 or 3 of them fill one
DVD, point 2 will be true for most of the clusters in typical files.
With the abovementioned restrictions on C LUSTERs, the overhead inside one Clus-
ter will be:

• C LUSTER ID + <size>: 7 bytes

• CRC32: 6 bytes

• T IMECODE: 5 bytes

• P REV C LUSTER S IZE: 5 bytes

• P OSITION: 5 bytes

• S EEKHEAD entry for C LUSTER: 17 bytes

Depending on the muxing settings, the overhead caused by one C LUSTER will be
between 12 and 45 bytes.
Example: Assuming a size of 1 MB per C LUSTER, that means an overhead rate of
0,001% - 0,005%, or up to 100 kB in a file of 2GB.

49
M ATROSKA file format

7.3 Overhead caused by Cues

Here again the layout of a C UE P OINT:

CuePoint <size>
CueTime <size> <time>
{ CueTrackPosition <size>
CueClusterPosition <size> <position>
CueTrack <size> <track>
[ CueBlockNumber <size> <block number> ]
}

Assuming that a C UE P OINT only points into one certain track, the overhead is:

• CuePoint: 2 bytes

• CueTime: 5 bytes

• CueTrackPosition: 2 bytes

• CueClusterPosition: 6 bytes

• CueTrack: 3 bytes

• CueBlockNumber: 4 bytes

Total: 22 bytes.

Example: Assuming that there is a C UE P OINT each 4 seconds (1 keyframe in 100


frames), this adds on overhead of 0,22 bytes / frame
There can also be C UE P OINTs for audio tracks. In that case, as every frame will be
a keyframe, the number of C UE P OINTs only depends on the muxing application.
Predicting the overhead requires to know its behaviour.

50
M ATROSKA file format

8 Links

Matroska pages / software:

http://www.matroska.org
http://haali.cs.msu.ru/mkv/
http://www-user.tu-chemnitz.de/~noe/Video-Zeug/
http://de.wikipedia.org/wiki/Matroska
http://www.matroska.info/
http://ld-anime.faireal.net/guide/jargon.matroska-en

51

You might also like