Matroska-File Format
Matroska-File Format
Contents
1 Introduction 4
2 EBML - basics 6
2.1 Unsigned Integer Values of Variable Length (”vint“) . . . . . . . . . 6
2.2 EBML elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Signed Integer Values of Variable Length (svint) . . . . . . . . . . . 7
2.4 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1
M ATROSKA file format
8 Links 51
List of Tables
1 EBML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 SegmentInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 SeekHead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Seek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 TrackEntry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
9 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
10 ContentEncodings . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11 ContentEncoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
12 ContentCompression . . . . . . . . . . . . . . . . . . . . . . . . . . 24
16 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
17 BlockGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
18 Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
19 CuePoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
20 CueTrackPositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
21 Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
22 EditionEntry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
23 ChapterAtom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2
M ATROSKA file format
24 ChapterTracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
25 ChapterDisplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
26 Attachments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
27 AttachedFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
28 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
29 Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
30 Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
31 SimpleTag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3
M ATROSKA file format
1 Introduction
4
M ATROSKA file format
please send me an e-mail (include ’matroska’ in the topic!). You can contact me in
german, english or french, whatever you prefer. Just don’t ask me if you can ask
something or if I could document some Digital Restrictions Management.
This document is powered by LaTeX, so changing the order of certain tables or the
style of those tables etc. is, with certain limits, possible within a few seconds.
Screenshots of real life file structures are used to illustrate the file structure. All of
them have been made using the EBML Tree Viewer in AVI-Mux GUI.
5
M ATROSKA file format
2 EBML - basics
EBML files use integers of variable size. This way, the file format doesn’t waste
space with storing 32 or even 64 bit integers in placed where they might sometimes
occur. The way the size is coded is inspired by the UTF-8 encoding format.
6
M ATROSKA file format
The Matroska file format does not allow integer lengths greater than 8, meaning
that the number of leading zeros is not higher than 7 and that the total length can
always be retrieved from the first byte.
typedef struct {
vint ID // EBML-ID
vint size // size of element
char[size] data // data
} EBML_ELEMENT;
The length of ID shall be called s_ID, the length of size shall be called s_size.
Elements that contain other EBML Elements are called EBML Master elements.
Generally, the order of EBML elements inside a parent element is not fixed. In
some cases, a certain order is recommended, but it is never mandatory. Especially,
no element order should be assumed inside small parent elements.
Signed integers have the following value: Read the integer as Unsigned Integer
and then subtract
vsint_subtr[length-1]
where
__int64 vsint_subtr [] =
{ 0x3F, 0x1FFF, 0x0FFFFF, 0x07FFFFFF,
0x03FFFFFFFF, 0x01FFFFFFFFFF,
0x00FFFFFFFFFFFFFF, 0x007FFFFFFFFFFFFF };
7
M ATROSKA file format
Whereas vints are used in the header section of EBML elements, the data types
describes in this section occur in the data section.
Integers, signed as well as unsigned, are stored in big endian byte order, with
leading 0x00 (in case of positive values) and 0xFF (in case of negative values)
being cut off (example for int: -257 is 0xFE 0xFF). An int/uint may not be larger
than 8 bytes.
2.4.2 Float
A Float value is a 32 or 64 bit real number, as defined in IEEE. 80 Bit values have
been in the specification, but have been removed and should not be used. The
bytes are stored in big endian order.
8
M ATROSKA file format
3.1 EBML
This header describes the contents of an EBML file. There should be only one
EBML header in one file. Any further EBML headers do not render a file invalid,
but shall be ignored by any application reading the file. Files with more than one
EBML header could be created for instance if two or more files are appended by
using the copy /b command.
3.2 Segment
A S EGMENT contains multimedia data, as well as any header data necessary for re-
play. There can be several S EGMENTs in one M ATROSKA file, but this is not encour-
aged to be done, as not many tools are able to handle multisegment M ATROSKA
files correctly. If you want to replay multisegment M ATROSKA files on Windows,
please use Haali Media splitter2
2
http://haali.cs.msu.ru/mkv/
9
M ATROSKA file format
The EBML top level element contains a description of the file type, such as EBML
version, file type name, file type version etc.
Obviously, this header being missing makes it necessary to guess the file type.
10
M ATROSKA file format
Element Description
uint, # ≤ 1 indicates the length of the longest EBML-ID the file contains.
EBMLM AX IDL ENGTH In case of matroska, this value is 4. Any EBML-ID which is
ID: 42 F2 longer than the value of this element shall be considered
def: 4 invalid.
uint, # ≤ 1 indicates the maximum s_size value the file contains. Any
EBMLM AX S IZE L ENGTH EBML element having an s_size value greater than EBML-
ID: 42 F3 MaxSizeLength shouldl be considered invalid.
def: 8
string, # ≤ 1 describes the contents of the file. In the case of a M ATROSKA
D OC T YPE file, its value is 'matroska'
ID: 42 82
def: matroska
uint, # ≤ 1 indicates the version of the $D OC T YPE writer used to create
D OC T YPE V ERSION the file
ID: 42 87
def: 1
uint, # ≤ 1 indicates the minimum version number a $D OC T YPE parser
D OC T YPE R EAD V ERSION must be compliant with to read the file.
ID: 42 85
def: 1
Index →page 2 end of EBML
As you can see, in the case of Matroska files all child elements of the EBML el-
ement have a default value. Thus, an empty EBML element would technically
introduce a Matroska file (with file type version 1, maximum ID length 4, max-
imum size length 8 etc.) correctly. However, I don’t recommend to push the
specifications like this.
It is not recommended to use either IDs or size values greater than 8 bytes. While
it’s clear that 8 bytes are enough to represent any size of anything on any hard
disc, one might think about using IDs larger than 8 bytes. However, since the ID is
considered an integer, treating IDs larger than 8 bytes is difficult on current CPUs,
which are limited to 64 bit for simple integer operations.
11
M ATROSKA file format
5.1 Overview
12
M ATROSKA file format
Element Description
Master, # ≥ 0 A C LUSTER contains video, audio and subtitle data. Note
C LUSTER (→16) that a M ATROSKA file could contain chapter data or attach-
ID: 1F 43 B6 75 ments, but no multimedia data, so C LUSTER is not a manda-
tory element.
Master, # ≥ 0 A T RACKS element contains the description of some or all
T RACKS (→6) tracks (preferably all). This element can be repeated once in
ID: 16 54 AE 6B a while for backup purposes. A file containing only chapters
and attachments does not have a T RACKS element, thus it’s
not mandatory.
Master, # ≤ 1 The C UES element contains a timestamp-wise index to
C UES (→18) C LUSTERs, thus it’s helpful for easy and quick seeking.
ID: 1C 53 BB 6B
Master, # ≤ 1 The ATTACHMENTS element contains all files attached to this
ATTACHMENTS (→26) S EGMENT.
ID: 19 41 A4 69
Master, # = 1 The C HAPTERS elements contains the definition of all chap-
C HAPTERS (→21) ters and editions of this S EGMENT
ID: 10 43 A7 70
Master, # ≤ 1 The TAGS element contains further information about the
TAGS (→28) S EGMENT or elements inside the S EGMENT that is not really
ID: 12 54 C3 67 required for playback.
Index →page 2 end of S EGMENT
5.2 SegmentInfo
The S EGMENT I NFO element contains general information about the S EGMENT,
such as its duration, the application used for writing the file, date of creation, a
unique 128 bit ID, to name a few only. Information included in the S EGMENT I NFO
element is not required for playback, but should be written by any M ATROSKA
muxer.
13
M ATROSKA file format
(read: <element name> (<s_size + s_ID>: <size> bytes at <position in file>: value)
14
M ATROSKA file format
Element Description
utf-8, # ≤ 1 contains the name of the file in which the S EGMENT having
P REV F ILENAME the ID $P REV UID is stored. P REV F ILENAME should not be
ID: 3C 83 AB considered reliable for the same reason as S EGMENT F ILE -
NAME , however, it could be the first filename the player is
looking for when the SEGMENT described in P REV UID is
needed
char[16], # ≤ 1 contains the unique 128 bit ID of the S EGMENT that is re-
N EXT UID played after the currently active S EGMENT, i.e. the ID of the
ID: 3E B9 23 S EGMENT that should be loaded if the user tries to seek to
a timecode after the end of the active S EGMENT. Like P RE -
V UID, the corresponding S EGMENT should be easy to locate.
utf-8, # ≤ 1 contains the name of the file in which the S EGMENT having
N EXT F ILENAME the ID $N EXT UID is stored. N EXT F ILENAME shall not be
ID: 3E 83 BB considered reliable for the same reason as S EGMENT F ILE -
NAME .
uint, # ≤ 1 Each scaled timecode in a M ATROSKA file is multiplied by
T IMECODE S CALE T IMECODE S CALE to obtain a timecode in nanoseconds. Note
ID: 2A D7 B1 that not all timecodes are scaled!
float, # ≤ 1 The D URATION indicates the duration of the S EGMENT. The
D URATION duration measured in nanoseconds is scaled and is thus
ID: 44 89 equal to $D URATION * $T IMECODE S CALE. This element
should be written.
utf-8, # ≤ 1 Contains a general name of the S EGMENT, like Lord of
T ITLE the Rings - The Two Towers. No language can be at-
ID: 7B A9 tached to the title, however, Tags (→ section 5.9) could be
used to define several titles for a segment. This is not yet
commonly done, though.
string, # = 1 contains the name of the library that has been used to create
M UXING A PP the file (like ”libmatroska 0.7.0“). This element should be
ID: 4D 80 written by any muxer! Especially if non-compliant files are
encountered, this help to know who must be blamed for that
file.
utf-8, # = 1 contains the name of the application used to create the file
W RITING A PP (like ”mkvmerge 0.8.1“). This element should be written for
ID: 57 41 the same reason as M UXING A PP.
S EGMENT I NFO continued on next page
15
M ATROSKA file format
Element Description
int, # ≤ 1 contains the production date, measured in nanoseconds rel-
D ATE UTC atively to Jan 01, 2001, 0:00:00 GMT+0h
ID: 44 61
Index →page 2 end of S EGMENT I NFO
5.3 SeekHead
The S EEK H EAD element contains a list of positions of Level 1 elements in the
S EGMENT. Each pair (element id, position) is stored in one S EEK element:
Not all Level 1 elements need to be included. Typical S EEK H EADs either include
a list of all Level 1 elements, or a list of all Level 1 elements except for C LUSTERs
(→ section 5.5). S EEK H EADs can also include references to other S EEK H EADs if
there is, for example, a small S EEK H EAD at the beginning of the file and a larger
one at its end.
16
M ATROSKA file format
The following picture illustrates the S EEK H EAD element in a real file. Note that
the EBML Tree Viewer replaced Level 1 IDs in S EEK ID with their human-readable
name:
17
M ATROSKA file format
5.4 Tracks
The T RACKS element contains information about the tracks that are stored in the
S EGMENT, like track type (audio, video, subtitles), the used codec, resolution and
sample rate. All tracks shall be described in one (or more, but preferably only one)
T RACKS element.
Each track is described in one T RACK E NTRY. Theoretically, using the T RACKUID,
information about one track could be spread over different T RACK E NTRYs, the UID
would allow to know which track the information applies to, however, it is highly
discouraged to stretch the specification like this.
Also, an empty T RACKS element would be rather useless, but should not lead to
a parser error since the file can be played if all tracks are defined somewhere. Es-
pecially pure chapter files might have an empty T RACKS element if the muxer
doesn’t catch the case that no tracks are present and consequently creates an
empty T RACKS element.
An example of a T RACK E NTRY element can be found on (→ page 26)
18
M ATROSKA file format
Element Description
uint, # = 1 (!) defines the type of a track, i.e. video, audio, subtitle etc.
T RACK T YPE (→13)
ID: 83
bool, # ≤ 1 When F LAG E NABLED is 1, track is used
F LAG E NABLED
ID: B9
def: 1
bool, # ≤ 1 When F LAG D EFAULT is 1, the track should be selected by
F LAG D EFAULT the player by default. Obviously, if no video track and/or no
ID: 88 audio track has a default flag, one video track and one audio
def: 1 track should be chosen by the player, whereas no subtitle
should be enabled if no subtitle has a default flag.
bool, # ≤ 1 When F LAG F ORCED is 1, the track must be played. When
F LAG F ORCED several subtitle tracks are forced, the one matching the au-
ID: 55 AA dio language should be chose. An example would be a sub-
def: 0 title track that cannot be disabled, like the one you find on
the german DVD “Eiskalte Engel” when you select english
audio. Since this flag can only be used to apply a restriction
on digital content, it must be qualified as Digital Restrictions
Management.
bool, # ≤ 1 When F LAG L ACING is 1, the track may contain laced blocks.
F LAG L ACING A parser that supports all types of lacing (→ section 6.2) can
ID: 9C safely ignore this flag.
def: 0
uint, # ≤ 1 indicates the number of frames a player must be able to
M IN C ACHE cache during playback. This is for instance interesting if a
ID: 6D E7 native MPEG4 file with frames in coding order is played.
def: 0
uint, # ≤ 1 indicates the maximum cache size a player needs to cache
M AX C ACHE frames. A value of NULL means that no cache is required.
ID: 6D F8
T RACK E NTRY continued on next page
19
M ATROSKA file format
Element Description
uint, # ≤ 1 This value indicates the number of nanoseconds a frame
D EFAULT D URATION lasts. This value is applied if no $D URATION value is in-
ID: 23 E3 83 dicated for a frame or if lacing (→ section 6.1) is used. A
value of 0 means that the duration of frames of the track
is not necessarily constant (e.g. variable framerate video,
or Vorbis audio). D EFAULT D URATION should be written for
each track with a constant frame rate since it makes seeking
easier.
float, # ≤ 1 Every timecode of a block (cluster timecode + block
T RACK T IMECODE S CALE timecode) is multiplied by this value to obtain the real time-
ID: 23 31 4F code of a block.
utf-8, # ≤ 1 A N AME element contains a human-readable name for the
N AME track. Note that you can’t define which language this track
ID: 53 6E name is in. You have to use Tags (→ section 5.9)) if you
want to use several titles in different languages for the same
track.
string, # ≤ 1 specifies the language of a track, using ISO639-23 . This
L ANGUAGE is NOT necessarily the language of $N AME, for example a
ID: 22 B5 9C german AC3 track could be called “German - AC3 5.1” or
def: eng “Deutsch - AC3 5.1” or “Allemand AC3 5.1” etc.
string, # = 1 (!) The C ODEC ID specifies the Codec4 which is used to decode
C ODEC ID the track.
ID: 86
binary, # ≤ 1 C ODEC P RIVATE contains information the codec needs before
C ODEC P RIVATE decoding can be started. An example is the Vorbis initializa-
ID: 63 A2 tion packets for Vorbis audio.
utf-8, # ≤ 1 C ODEC N AME is a human-readable name of the Codec
C ODEC N AME
ID: 25 86 88
uint, # ≥ 0 An ATTACHMENT L INK contains the UID of an attachment
ATTACHMENT L INK that is used by this track.
ID: 74 46
T RACK E NTRY continued on next page
3
http://lcweb.loc.gov/standards/iso639-2/englangn.html
4
http://matroska.org/technical/specs/codecid/index.html
20
M ATROSKA file format
Element Description
Master, # ≤ 1 V IDEO contains information that is specific for video tracks
V IDEO (→8)
ID: E0
Master, # ≤ 1 AUDIO contains information that is specific for audio tracks
AUDIO (→9)
ID: E1
Master, # ≤ 1 C ONTENT E NCODINGS contains information about (lossless)
C ONTENT E NCODINGS compression or encryption of the track
(→10)
ID: 6D 80
Index →page 2 end of T RACK E NTRY
Obviously, the V IDEO element must be present for video tracks, whereas the AUDIO
element must be present for audio tracks. Although it doesn’t make sense to have
both elements in one T RACK E NTRY element, it wouldn’t make a file unplayable.
21
M ATROSKA file format
Element Description
uint, # ≤ 1 Number of Pixels to be cropped from the left
P IXEL C ROP L EFT
ID: 54 CC
def: 0
uint, # ≤ 1 Number of Pixels to be cropped from the right
P IXEL C ROP R IGHT
ID: 54 DD
def: 0
uint, # ≤ 1 Width of the video during playback
D ISPLAY W IDTH
ID: 54 B0
def: $P IXELW IDTH
uint, # ≤ 1 Height of the video during playback
D ISPLAY H EIGHT
ID: 54 BA
def: $P IXEL H EIGHT
uint, # ≤ 1 Unit $D ISPLAY W IDTH and $D ISPLAY H EIGHT is measured
D ISPLAY U NIT in. This can be 0→pixels, 1→centimeters, 2→inches
ID: 54 B2
def: 0
Index →page 2 end of V IDEO
22
M ATROSKA file format
Element Description
uint, # ≤ 1 Indicates the sample rate the track must be played at in
O UTPUT- Hz. The default value of this element is equal to $S AM -
S AMPLING F REQUENCY PLING F REQUENCY .
ID: 78 B5
uint, # ≤ 1 Number of channels of the audio track
C HANNELS
ID: 9F
def: 1
uint, # ≤ 1 Bits per sample, this is usually used with PCM-Audio.
B IT D EPTH
ID: 62 64
Index →page 2 end of AUDIO
Table 10: The C ONTENT E NCODINGS element, child of T RACK E NTRY (→7)
Element Description
Master, # ≥ 1 A C ONTENT E NCODING-element describes one compression
C ONTENT E NCODING or encryption that has been used on this track.
(→11)
ID: 62 40
Index →page 2 end of C ONTENT E NCODINGS
23
M ATROSKA file format
Element Description
uint, # ≤ 1 Defines which parts of the track are compressed or en-
C ONTENT E NCODING - crypted this way
S COPE (→14)
ID: 50 32
def: 1
uint, # ≤ 1 Describes which type of encoding is described. 0 → com-
C ONTENT E NCODING - pression, 1 → encryption
T YPE
ID: 50 33
def: 0
Master, # ≤ 1 If C ONTENT E NCODING T YPE =0, this element describes how
C ONTENT C OMPRESSION it is compressed
(→12)
ID: 50 34
Master, # ≤ 1 If C ONTENT E NCRYPTION =1, this element describes how it
C ONTENT E NCRYPTION is encrypted
(→??)
ID: 50 35
Index →page 2 end of C ONTENT E NCODING
The C ONTENT E NCODING element allows to apply not only encryption, but also
lossless compression to a track. This can be used to compress text subtitles, but
also to remove sync headers from audio packets. For example, each AC3 frame
starts with 0B 77, and there is no real point in saving those two bytes for each
frame in a M ATROSKA file. For a simple AC3 file, this does make sense because
there it can be used to find a new frame start if data is damaged.
24
M ATROSKA file format
Element Description
binary, # ≤ 1 Contains settings that are required for decompression.
C ONTENT C OMP S ETTINGS These settings are specific for each compression algorithm.
ID: 42 55 For example, it contains the striped header bytes when
$C ONTENT C OMPA LGO=3 (→ page 26).
Index →page 2 end of C ONTENT C OMPRESSION
25
M ATROSKA file format
Here is one example of a possible T RACK E NTRY element: A DTS-audio track that
is using header striping. The C ONTENT C OMP S ETTINGS element contains the four
bytes each DTS frame starts with.
26
M ATROSKA file format
27
M ATROSKA file format
5.5 Cluster
A C LUSTER contains multimedia data and usually spans over a range of a few
seconds. The following picture shows a typical cluster:
28
M ATROSKA file format
Element Description
uint, # ≤ 1 Indicates the size of the preceding cluster in bytes. This
P REV S IZE helps to seek backwards, and to find the preceding cluster,
ID: AB without having to look at M ETASEEK or C UE data. This is
also helpful to resync, e.g. if the EBML-ID of the preceding
C LUSTER is damaged.
Master, # ≥ 0 Contains a B LOCK along with some attached information
B LOCK G ROUP (→17) like references
ID: A0
binary, # ≥ 0 This is a B LOCK (→ page 42) without additional attached in-
S IMPLE B LOCK formation. Since a S IMPLE B LOCK does not require a B LOCK-
ID: A3 G ROUP around it, it causes less overhead. S IMPLE B LOCK is
M ATROSKA v2.
Index →page 2 end of C LUSTER
29
M ATROSKA file format
5.6 Cues
The C UEs element contains information helpful (but not necessary) for seeking.
Each piece of information, called a C UE P OINT, contains a timestamp, and a list of
pairs (track number, (cluster position[, block number within cluster])). Generally,
a C UE P OINT should only point to keyframes.
30
M ATROSKA file format
31
M ATROSKA file format
The C HAPTERS element contains a list of all editions and chapters found in this
S EGMENT. Chapters in M ATROSKA files are more powerful than chapters on DVDs,
their handling is, however, way more complex.
An edition contains one set of chapter definitions, so having several editions means
having several sets of chapter definitions. This case is used when using this as
a playlist - playing one chapter after the other while having gaps between the
chapters.
32
M ATROSKA file format
Element Description
bool, # ≤ 1 When $E DITION F LAG O RDERED is 1, this edition contains a
E DITION F LAG O RDERED playlist. When $E DITION F LAG O RDERED is 0, it contains a
ID: 45 DD simple DVD like chapter definition.
def: 0
Master, # ≥ 1 One C HAPTER ATOM contains the definition of one chapter.
C HAPTER ATOM (→23) This element is the only one in M ATROSKA files that can con-
ID: B6 tain itself recursively - in this case to define subchapters.
Index →page 2 end of E DITION E NTRY
33
M ATROSKA file format
Table 23: The C HAPTER ATOM element, child of E DITION E NTRY (→22),
child of C HAPTER ATOM (→23)
Element Description
uint, # = 1 The UID of this chapter. It must be unique within the file.
C HAPTER UID
ID: 73 C4
uint, # ≤ 1 The unscaled timecode the chapter starts at. As the value
C HAPTERT IME S TART is unsigned, a chapter cannot start earlier than at timecode
ID: 91 0, even whereas timecodes up to -30.000 are possible for
def: 0 multimedia data.
uint, # ≤ 1 The unscaled timecode the chapter ends at. The default
C HAPTERT IME E ND value is the start of the next chapter or the end of the parent
ID: 92 chapter or the end of the segment, whatever exists, in that
order.
bool, # ≤ 1 When $C HAPTER F LAG H IDDEN is 1, the chapter should not
C HAPTER F LAG H IDDEN be visible in the user interface, but should be played back
ID: 98 normally.
def: 0
bool, # ≤ 1 When $C HAPTER F LAG E NABLED is 0, the chapter should be
C HAPTER F LAG E NABLED skipped by the player
ID: 45 98
def: 1
char[16], # ≤ 1 This element can only occur if $E DITION F LAG O RDERED=1.
C HAPTER S EGMENT UID The S EGMENT of which the UID is $C HAPTER S EGMENT UID
ID: 6E 67 is used instead of the current S EGMENT. Obviously, this
S EGMENT should be easy to find, like when it is the first
segment of a file in the same directory.
uint, # ≤ 1 The edition to use inside the S EGMENT selected via C HAP-
C HAPTER S EGMENT TER S EGMENT UID. The timecodes $C HAPTERT IME S TART
-E DITION UID and $C HAPTERT IME E ND refer to playback timecodes of that
ID: 6E BC edition, i.e. the timecodes are relative to that playlist. This
is called “nested Editions” and is NOT SUPPORTED by Haali
Media Splitter.
C HAPTER ATOM continued on next page
34
M ATROSKA file format
Element Description
Master, # ≤ 1 Contains a list of tracks the chapter applies to.
C HAPTERT RACKS
(→24)
ID: 8F
Master, # ≥ 0 Contains all chapter titles
C HAPTER D ISPLAY
(→25)
ID: 80
Index →page 2 end of C HAPTER ATOM
A useful application for the C HAPTER F LAG H IDDEN element in connection with
ordered editions is the following: You have a couple of episodes of a series, but
want to save space by only saving the intro and outtro once. You create one playlist
(ordered edition) per episode, and another playlist playing all episodes in a row.
Whereas in the first case you might want to play intro and outtro for each episode,
you might not want to do that in the second case.
If you don’t want to make the three parts intro - movie - outtro selectable via the
user interface when playing single episodes, you call the intro-chapter “Episode
- blah” and hide the movie- and the outtro chapter using $C HAPTER F LAG H ID -
DEN =1. Then, the playlist playing all episodes would be intro - episode 1 - episode
2 - ... - last episode - outtro, whereas the other playlists would be intro - episode N
- outtro. The name of the intro chapter would be set to “Episode n”.
Table 24: The C HAPTERT RACKS element, child of C HAPTER ATOM (→23)
Element Description
uint, # ≥ 1 One number of a track a chapter is used with.
C HAPTERT RACK N UMBER
ID: 89
Index →page 2 end of C HAPTERT RACKS
35
M ATROSKA file format
Table 25: The C HAPTER D ISPLAY element, child of C HAPTER ATOM (→23)
Element Description
utf-8, # ≤ 1 A title of a chapter
C HAP S TRING
ID: 85
string, # ≥ 0 The language of $C HAP S TRING as defined in ISO639-25
C HAP L ANGUAGE
ID: 43 7C
def: eng
utf-8, # ≥ 0 A country the title is used in. For example, a german title in
C HAP C OUNTRY Germany might be different than the title used in Austria.
ID: 43 7E
Index →page 2 end of C HAPTER D ISPLAY
5
http://lcweb.loc.gov/standards/iso639-2/englangn.html#two
36
M ATROSKA file format
5.8 Attachments
Theoretically, any file type can be attached to a M ATROSKA file, however, this
possibility is usually used to attach pictures like CD covers or fonts required to
display a subtitle track correctly. Obviously, attaching executable files would allow
for M ATROSKA files to contain viruses - a scenario that is not exactly the indended
application of attachments or anything else M ATROSKA is capable of.
37
M ATROSKA file format
5.9 Tags
TAGS provide additional information6 not important for replay. A TAGS element
contains a number of TAG elements. Each TAG element contains a list of UIDs (usu-
ally T RACKUIDs or E DITION UIDs), and a list of S IMPLE TAGs, each one containing
a name and a value:
6
http://www.matroska.org/technical/specs/tagging/index.html
38
M ATROSKA file format
If no TARGETs are specified, then the TAG is a global TAG refering to the entire
S EGMENT. Of course, two different TAG elements can contain identical TARGETS.
39
M ATROSKA file format
Element Description
Master, # ≥ 1 Each S IMPLE TAG contains one tag that applies to each target
S IMPLE TAG (→31) in TARGETS
ID: 67 C8
Index →page 2 end of TAG
40
M ATROSKA file format
• TITLE, Target: EditionUID: used to define names for Editions. This is exactly
what you can see in the screenshot above.
41
M ATROSKA file format
42
M ATROSKA file format
The following flags are only defined for Matroska v2 and can thus only be used
in a S IMPLE B LOCK: keyframe, invisible, discardable. The type of lacing in use
defines how the SIZE values are to be read.
6.2 Lacing
Lacing is a technique that allows to store more than one atom of data (like one
audio frame) in one block, with the goal to decrease overhead, without losing the
ability to separate the frames in a lace later again.
Generally, the size of the last frame in a Lace is not stored, as it can be derived
from the total block size, the size of the block header and the sum of the sizes of
all other frames.
Frame duration values are not preserved! That means, it is highly recommended
not to use lacing if the frame duration is not constant, like Vorbis audio.
The size of each frame is coded as a sum of int8. A value smaller than 255 indi-
cates that the next value refers to the next frame.
Example
size = { 187, 255, 255, 120, 255, 0, 60 } means that there are 4 frames
with 187, 630, 255, 60 bytes.
Fixed Lacing is used if all frames in a lace have the same size. Examples are AC3
or DTS audio. In this case, knowing the number of frames is enough to calculate
the size of one frame. Consequently, there are no size values.
43
M ATROSKA file format
The scope of this section is explaining how to predict the overhead of a M ATROSKA
file before muxing, and without analysing any of the source files excessively. This
section assumes that B LOCK G ROUPS and B LOCKS are used, and that no S IMPLE -
B LOCKS are used. If you want to estimate overhead of files that use S IMPLE -
B LOCKS, you get about the same overhead as with B LOCKS without B LOCK D URA -
TION , R EFERENCE B LOCK or B LOCK G ROUP .
BlockGroup <size>
Block <size> <number, flag, timecode>
[ Reference <size> <val> ]
The EBML identication for B LOCKs and B LOCK G ROUPs are 1 byte each, so that the
structure above, not counting R EFERENCEs, takes:
B LOCK G ROUPs larger than 2MBytes are extremely unlike, and even B LOCK G ROUPs
larger than 16kBytes won’t occur often, compared to B LOCK G ROUPs between 128
bytes and 16 kBytes. That means, assuming an overhead of 10 bytes for B LOCK-
G ROUPs without R EFERENCES usually results in a good approximation.
7.1.1 video
In a typical video stream, there are a lot of frames with 1 R EFERENCE (P-Frames,
Delta-Frames), and a few keyframes. Typical rations are 100:1. There might also
be frames with 2 R EFERENCES (B-Frames), e.g. native MPEG4 streams. Assuming
a ratio of 66:33:1 for B:P:K, and assuming a bitrate far below 3,2 MBit/s (meaning
that typical B- and P-frames are smaller than 16 kB), that causes about 15 bytes of
44
M ATROSKA file format
overhead per frame. If there are no B-Frames, there are about 13 bytes per frame.
As audio does usually not have any R EFERENCEs (all audio frames are keyframes),
one audio frame will take 8 or 10 bytes of overhead. For MP3, AC3, DTS and AAC,
frames causing 8 bytes of overhead are unlikely. They are more likely for Vorbis.
Example: AC3 audio, 448 kbps, 1792 bytes per frame, 32ms per frame
1.) 8 frames per lace.
overhead for one frame = 11/8 = 1,375 bytes = 1 byte / 23,3 ms.
2.) 9 frames per lace.
overhead for one frame = 11/9 = 1,222 bytes = 1 byte / 26,2 ms.
3.) 10 frames per lace.
overhead for one frame = 13/10 = 1,3 bytes = 1 byte / 24,6 ms.
An AC3 stream of 2 hours with 9 frames per lace will cause 270kB of overhead.
2. no CBR, but almost all frames smaller than 255 bytes: XIPH lacing
In this case, XIPH lacing (see section 6.2.1) is used, meaning that the overhead of
a B LOCK G ROUP is equal to normal BlockGroup overhead + frame_count, mean-
ing that the overhead per frame is about (11+frame_count)/frame_count, if there
are frame_count frames in each lace. Again, if the B LOCK G ROUPs are larger than
16kBytes, then the overhead is (13+frame_count)/frame_count.
In other words, the ratio in bytes / frame will always be between about 1,2 and
45
M ATROSKA file format
3. otherwise: EBML lacing Assuming that the difference in size between 2 con-
secutive frames is smaller than 8191, 1 or 2 bytes are needed to code the size of
each frame, additionally to the normal B LOCK G ROUP overhead.
46
M ATROSKA file format
ulator’). Note that it would be required to run the simulation and to evaluate the
results as follows for each audio format, in each bitrate, maybe even with each
encoder, for which results as accurate as possible shall be predicted.
The results for the lace header size are as follows:
Lace header overhead per frame @ <x> Frames per lace
Audio Format 4 8 12 16 24 32 48 64 96
MP3 @ 128 kbps 1,39 1,29 1,26 1,24 1,22 1,22 1,21 1,20 1,20
MP3 @ 192 kbps 1,50 1,41 1,38 1,37 1,36 1,35 1,34 1,34 1,33
HE-AAC @ 224 kbps 1,39 1,29 1,25 1,24 1,22 1,21 1,20 1,20 1,20
HE-AAC @ 64 kbps 1,34 1,23 1,19 1,18 1,16 1,15 1,14 1,14 1,13
LC-AAC @ 268 kbps 1,31 1,19 1,16 1,14 1,12 1,11 1,10 1,09 1,09
Applications using libmatroska for M ATROSKA file creation are using 8 frames
per lace. As a consequence, the overhead for a track using EBML lacing can be
predicted to an acceptable accuracy if the audio format is known.
As you can also see, larger laces hardly affect the overhead caused by the lace
headers of B LOCKs from a certain size on.
However, larger laces mean fewer B LOCKs and thus fewer B LOCK G ROUPs, so the
total overhead per frame, including the overhead caused by overhead outside of
the B LOCKS, is worth a look. Here are the results with the same test files as above
Overhead per frame @ <x> Frames per lace
Audio Format 4 8 12 16 24 32 48 64 96
MP3 @ 128 kbps 4,14 2,67 2,17 1,93 1,68 1,56 1,48 1,41 1,33
MP3 @ 192 kbps 4,25 2,79 2,30 2,06 1,81 1,75 1,61 1,54 1,47
HE-AAC @ 224 kbps 4,14 2,66 2,23 2,05 1,76 1,62 1,48 1,40 1,33
HE-AAC @ 64 kbps 4,09 2,61 2,11 1,86 1,62 1,49 1,40 1,34 1,27
LC-AAC @ 268 kbps 4,06 2,57 2,07 1,82 1,66 1,51 1,37 1,30 1,22
Now lets take the 2nd table and find out how much overhead that means in a real
movie of 2 hours.
In the case of the mp3 files used in that example, one frame lasts 24ms. In the
case of our LC-AAC file, one frame lasts 23,22 ms, and for the HE-AAC file we get
46,44ms.
Thus a file of 2 hours will have the following number of frames:
MP3 - 300,000
LC-AAC - 310,000
HE-AAC - 155,000.
47
M ATROSKA file format
First, lets use the default setting of libmatroska (8 frames per lace) and calculate
the overhead a muxing app using libmatroska would cause when muxing those
files into a movie:
With 24 frames per lace, an MP3 block would have a duration of 576ms, an HE-
AAC block even about 1 second. That means, when seeking in a file, an awkward
impression of the audio being missing for a moment could occur. Thus, larger laces
than 1 second are highly discouraged. Nevertheless, let’s analyze the overhead in
our file for laces of 24 and 96 frames each, and compare the overhead to the one
caused by libmatroska. Here is the corresponding table:
Frames per lace
Audio Format 8 24 96
MP3 @ 128 kbps 782kB 492kB 389kB
MP3 @ 192 kbps 817kB 530kB 430kB
HE-AAC @ 224 kbps 402kB 266kB 201kB
HE-AAC @ 64 kbps 395kB 245kB 192kB
LC-AAC @ 268 kbps 778kB 502kB 369kB
As you can see, putting 24 frames in one block, compared to 8 frames, saves
some overhead. However, putting 96 frames in one B LOCK instead of 24 saves
less overhead than 24 compared to 8. As 96 frames per lace will usually cause
uncomfortable seeking, it is recommended not to put more than about 24 frames
in one B LOCK.
48
M ATROSKA file format
Although most of the overhead is caused by B LOCK G ROUPs, the amount of over-
head caused by C LUSTERS themselves is noticeable as well.
Here again the basic layout of a C LUSTER:
Cluster <size>
[ CRC32 ]
TimeCode <size> <timecode>
[ PrevClusterSize <size> <prevsize> ]
[ Position <size> <position> ]
{ BlockGroup }
As typical movie files are designed to fit on 1 or 2 CDs, or 2 or 3 of them fill one
DVD, point 2 will be true for most of the clusters in typical files.
With the abovementioned restrictions on C LUSTERs, the overhead inside one Clus-
ter will be:
• CRC32: 6 bytes
• T IMECODE: 5 bytes
• P OSITION: 5 bytes
Depending on the muxing settings, the overhead caused by one C LUSTER will be
between 12 and 45 bytes.
Example: Assuming a size of 1 MB per C LUSTER, that means an overhead rate of
0,001% - 0,005%, or up to 100 kB in a file of 2GB.
49
M ATROSKA file format
CuePoint <size>
CueTime <size> <time>
{ CueTrackPosition <size>
CueClusterPosition <size> <position>
CueTrack <size> <track>
[ CueBlockNumber <size> <block number> ]
}
Assuming that a C UE P OINT only points into one certain track, the overhead is:
• CuePoint: 2 bytes
• CueTime: 5 bytes
• CueTrackPosition: 2 bytes
• CueClusterPosition: 6 bytes
• CueTrack: 3 bytes
• CueBlockNumber: 4 bytes
Total: 22 bytes.
50
M ATROSKA file format
8 Links
http://www.matroska.org
http://haali.cs.msu.ru/mkv/
http://www-user.tu-chemnitz.de/~noe/Video-Zeug/
http://de.wikipedia.org/wiki/Matroska
http://www.matroska.info/
http://ld-anime.faireal.net/guide/jargon.matroska-en
51