Leandromoreira Ffmpeg-Libav-Tutorial - 1
Leandromoreira Ffmpeg-Libav-Tutorial - 1
github.com/leandromoreira/ffmpeg-libav-tutorial
leandromoreira
Most of the code in here will be in C but don't worry: you can easily
understand and apply it to your preferred language. FFmpeg libav has lots
of bindings for many languages like python, go and even if your language
doesn't have it, you can still support it through the ffi (here's an example
with Lua).
We'll start with a quick lesson about what is video, audio, codec and
container and then we'll go to a crash course on how to use FFmpeg
command line and finally we'll write code, feel free to skip directly tothe
section Learn FFmpeg libav the Hard Way.
Some people used to say that the Internet video streaming is the future of
the traditional TV, in any case, the FFmpeg is something that is worth
studying.
Table of Contents
Intro
1/27
audio - what you listen!
Although a muted video can express a variety of feelings, adding sound to it
brings more pleasure to the experience.
Sound is the vibration that propagates as a wave of pressure, through the air
or any other transmission medium, such as a gas, liquid or solid.
2/27
Source
A single file that contains all the streams (mostly the audio and video)
and it also provides synchronization and general metadata, such as
title, resolution and etc.
3/27
Usually we can infer the format of a file by looking at its extension: for
instance a video.webm is probably a video using the container webm.
It has a command line program called ffmpeg , a very simple yet powerful
binary. For instance, you can convert from mp4 to the container avi just
by typing the follow command:
To make things short, the FFmpeg command line program expects the
following argument format to perform its actions ffmpeg {1} {2} -i {3}
{4} {5} , where:
1. global options
2. input file options
3. input url
4/27
4. output file options
5. output url
The parts 2, 3, 4 and 5 can be as many as you need. It's easier to understand
this argument format in action:
$ ffmpeg \
-y \ # global options
-c:a libfdk_aac -c:v libx264 \ # input options
-i bunny_1080p_60fps.mp4 \ # input url
-c:v libvpx-vp9 -c:a libvorbis \ # output options
bunny_1080p_60fps_vp9.webm # output url
This command takes an input file mp4 containing two streams (an audio
encoded with aac CODEC and a video encoded using h264 CODEC) and
convert it to webm , changing its audio and video CODECs too.
We could simplify the command above but then be aware that FFmpeg will
adopt or guess the default values for you. For instance when you just type
ffmpeg -i input.avi output.mp4 what audio/video CODEC does it use
to produce the output.mp4 ?
Transcoding
What? the act of converting one of the streams (audio or video) from one
CODEC to another one.
5/27
Why? sometimes some devices (TVs, smartphones, console and etc) doesn't
support X but Y and newer CODECs provide better compression rate.
$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-c:v libx265 \
bunny_1080p_60fps_h265.mp4
Transmuxing
What? the act of converting from one format (container) to another one.
Why? sometimes some devices (TVs, smartphones, console and etc) doesn't
support X but Y and sometimes newer containers provide modern required
features.
$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-c copy \ # just saying to ffmpeg to skip encoding
bunny_1080p_60fps.webm
Transrating
6/27
What? the act of changing the bit rate, or producing other renditions.
Why? people will try to watch your video in a 2G (edge) connection using
a less powerful smartphone or in a fiber Internet connection on their 4K
TVs therefore you should offer more than one rendition of the same video
with different bit rate.
How? producing a rendition with bit rate between 3856K and 2000K.
$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-minrate 964K -maxrate 3856K -bufsize 2000K \
bunny_1080p_60fps_transrating_964_3856.mp4
Transsizing
What? the act of converting from one resolution to another one. As said
before transsizing is often used with transrating.
7/27
How? converting a 1080p to a 480p resolution.
$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-vf scale=480:-1 \
bunny_1080p_60fps_transsizing_480.mp4
What? the act of producing many resolutions (bit rates) and split the media
into chunks and serve them via http.
8/27
# video streams
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 160x90 -b:v 250k
-keyint_min 150 -g 150 -an -f webm -dash 1 video_160x90_250k.webm
# audio streams
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:a libvorbis -b:a 128k -vn -f webm
-dash 1 audio_128k.webm
PS: I stole this example from the Instructions to playback Adaptive WebM
using DASH
Going beyond
There are many and many other usages for FFmpeg. I use it in conjunction
with iMovie to produce/edit some videos for YouTube and you can certainly
use it professionally.
Don't you wonder sometimes 'bout sound and vision? David Robert Jones
FFmpeg is composed by several libraries that can be integrated into our own
programs. Usually, when you install FFmpeg, it installs automatically all
these libraries. I'll be referring to the set of these libraries as FFmpeg
9/27
libav.
This title is a homage to Zed Shaw's series Learn X the Hard Way,
particularly his book Learn C the Hard Way.
You'll first need to load your media file into a component called
AVFormatContext (the video container is also known as format). It actually
doesn't fully load the whole file: it often only reads the header.
Once we loaded the minimal header of our container, we can access its
streams (think of them as a rudimentary audio and video data). Each stream
will be available in a component called AVStream.
Suppose our video has two streams: an audio encoded with AAC CODEC
and a video encoded with H264 (AVC) CODEC. From each stream we can
extract pieces (slices) of data called packets that will be loaded into
components named AVPacket.
10/27
The data inside the packets are still coded (compressed) and in order
to decode the packets, we need to pass them to a specific AVCodec.
The AVCodec will decode them into AVFrame and finally, this component
gives us the uncompressed frame. Noticed that the same
terminology/process is used either by audio and video stream.
Requirements
Since some people were facing issues while compiling or running the
examples we're going to use Docker as our development/runner
environment, we'll also use the big buck bunny video so if you don't have
it locally just run the command make fetch_small_bunny_video .
$ make run_hello
We'll skip some details, but don't worry: the source code is available at
github.
Now we're going to open the file and read its header and fill the
AVFormatContext with minimal information about the format (notice that
usually the codecs are not opened). The function used to do this is
avformat_open_input. It expects an AVFormatContext , a filename
and two optional arguments: the AVInputFormat (if you pass NULL ,
FFmpeg will guess the format) and the AVDictionary (which are the
options to the demuxer).
To access the streams , we need to read data from the media. The function
avformat_find_stream_info does that. Now, the pFormatContext-
>nb_streams will hold the amount of streams and the pFormatContext-
>streams[i] will give us the i stream (an AVStream).
avformat_find_stream_info(pFormatContext, NULL);
11/27
Now we'll loop through all the streams.
With the codec properties we can look up the proper CODEC querying the
function avcodec_find_decoder and find the registered decoder for the
codec id and return an AVCodec, the component that knows how to enCOde
and DECode the stream.
With the codec, we can allocate memory for the AVCodecContext, which will
hold the context for our decode/encode process, but then we need to fill this
codec context with CODEC parameters; we do that with
avcodec_parameters_to_context.
Once we filled the codec context, we need to open the codec. We call the
function avcodec_open2 and then we can use it.
Now we're going to read the packets from the stream and decode them into
frames but first, we need to allocate memory for both components, the
AVPacket and AVFrame.
12/27
Let's feed our packets from the streams with the function av_read_frame
while it has packets.
Let's send the raw data packet (compressed frame) to the decoder,
through the codec context, using the function avcodec_send_packet.
avcodec_send_packet(pCodecContext, pPacket);
And let's receive the raw data frame (uncompressed frame) from the
decoder, through the same codec context, using the function
avcodec_receive_frame.
avcodec_receive_frame(pCodecContext, pFrame);
We can print the frame number, the PTS, DTS, frame type and etc.
printf(
"Frame %c (%d) pts %d dts %d key_frame %d [coded_picture_number
%d, display_picture_number %d]",
av_get_picture_type_char(pFrame->pict_type),
pCodecContext->frame_number,
pFrame->pts,
pFrame->pkt_dts,
pFrame->key_frame,
pFrame->coded_picture_number,
pFrame->display_picture_number
);
Finally we can save our decoded frame into a simple gray image. The
process is very simple, we'll use the pFrame->data where the index is
related to the planes Y, Cb and Cr, we just picked 0 (Y) to save our gray
image.
13/27
And voilà! Now we have a gray scale image with 2MB:
In the last example, we saved some frames that can be seen here:
14/27
started at 0):
Now with the pts_time we can find a way to render this synched with
audio pts_time or with a system clock. The FFmpeg libav provides these
info through its API:
fps = AVStream->avg_frame_rate
tbr = AVStream->r_frame_rate
tbn = AVStream->time_base
Just out of curiosity, the frames we saved were sent in a DTS order (frames:
1,6,4,2,3,5) but played at a PTS order (frames: 1,2,3,4,5). Also, notice how
cheap are B-Frames in comparison to P or I-Frames.
15/27
LOG: AVStream->r_frame_rate 60/1
LOG: AVStream->time_base 1/60000
...
LOG: Frame 1 (type=I, size=153797 bytes) pts 6000 key_frame 1 [DTS 0]
LOG: Frame 2 (type=B, size=8117 bytes) pts 7000 key_frame 0 [DTS 3]
LOG: Frame 3 (type=B, size=8226 bytes) pts 8000 key_frame 0 [DTS 4]
LOG: Frame 4 (type=B, size=17699 bytes) pts 9000 key_frame 0 [DTS 2]
LOG: Frame 5 (type=B, size=6253 bytes) pts 10000 key_frame 0 [DTS 5]
LOG: Frame 6 (type=P, size=34992 bytes) pts 11000 key_frame 0 [DTS 1]
Chapter 2 - remuxing
Remuxing is the act of changing from one format (container) to another, for
instance, we can change a MPEG-4 video to a MPEG-TS one without much
pain using FFmpeg:
It'll demux the mp4 but it won't decode or encode it ( -c copy ) and in the
end, it'll mux it into a mpegts file. If you don't provide the format -f the
ffmpeg will try to guess it based on the file's extension.
16/27
This graph is strongly inspired by Leixiaohua's and Slhck's works.
Now let's code an example using libav to provide the same effect as in
ffmpeg input.mp4 -c copy output.ts .
We start doing the usually allocate memory and open the input format. For
this specific case, we're going to open an input file and allocate memory for
an output file.
17/27
We're going to remux only the video, audio and subtitle types of streams so
we're holding what streams we'll be using into an array of indexes.
number_of_streams = input_format_context->nb_streams;
streams_list = av_mallocz_array(number_of_streams,
sizeof(*streams_list));
Just after we allocated the required memory, we're going to loop throughout
all the streams and for each one we need to create new out stream into our
output format context, using the avformat_new_stream function. Notice
that we're marking all the streams that aren't video, audio or subtitle so we
can skip them after.
18/27
After that, we can copy the streams, packet by packet, from our input to our
output streams. We'll loop while it has packets ( av_read_frame ), for each
packet we need to re-calculate the PTS and DTS to finally write it
( av_interleaved_write_frame ) to our output format context.
while (1) {
AVStream *in_stream, *out_stream;
ret = av_read_frame(input_format_context, &packet);
if (ret < 0)
break;
in_stream = input_format_context->streams[packet.stream_index];
if (packet.stream_index >= number_of_streams ||
streams_list[packet.stream_index] < 0) {
av_packet_unref(&packet);
continue;
}
packet.stream_index = streams_list[packet.stream_index];
out_stream = output_format_context->streams[packet.stream_index];
/* copy packet */
packet.pts = av_rescale_q_rnd(packet.pts, in_stream->time_base,
out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
packet.dts = av_rescale_q_rnd(packet.dts, in_stream->time_base,
out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
packet.duration = av_rescale_q(packet.duration, in_stream-
>time_base, out_stream->time_base);
//
https://ffmpeg.org/doxygen/trunk/structAVPacket.html#ab5793d8195cf4789d
packet.pos = -1;
//https://ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga37352ed
To finalize we need to write the stream trailer to an output media file with
av_write_trailer function.
av_write_trailer(output_format_context);
Now we're ready to test it and the first test will be a format (video container)
conversion from a MP4 to a MPEG-TS video file. We're basically making the
command line ffmpeg input.mp4 -c copy output.ts with libav.
make run_remuxing_ts
It's working!!! don't you trust me?! you shouldn't, we can check it with
ffprobe :
19/27
ffprobe -i remuxed_small_bunny_1080p_60fps.ts
To sum up what we did here in a graph, we can revisit our initial idea about
how libav works but showing that we skipped the codec part.
Before we end this chapter I'd like to show an important part of the
remuxing process, you can pass options to the muxer. Let's say we
want to delivery MPEG-DASH format for that matter we need to use
fragmented mp4 (sometimes referred as fmp4 ) instead of MPEG-TS or
plain MPEG-4.
Almost equally easy as the command line is the libav version of it, we just
need to pass the options when write the output header, just before the
packets copy.
20/27
We now can generate this fragmented mp4 file:
make run_remuxing_fragmented_mp4
But to make sure that I'm not lying to you. You can use the amazing site/tool
gpac/mp4box.js or the site http://mp4parser.com/ to see the differences,
first load up the "common" mp4.
Chapter 3 - transcoding
$ make run_transcoding
We'll skip some details, but don't worry: the source code is available at
github.
21/27
Just a quick recap: The AVFormatContext is the abstraction for the format
of the media file, aka container (ex: MKV, MP4, Webm, TS). The AVStream
represents each type of data for a given format (ex: audio, video, subtitle,
metadata). The AVPacket is a slice of compressed data obtained from the
AVStream that can be decoded by an AVCodec (ex: av1, h264, vp9, hevc)
generating a raw data called AVFrame.
Transmuxing
Let's start with the simple transmuxing operation and then we can build
upon this code, the first step is to load the input file.
// Allocate an AVFormatContext
avfc = avformat_alloc_context();
// Open an input stream and read the header.
avformat_open_input(avfc, in_filename, NULL, NULL);
// Read packets of a media file to get stream information.
avformat_find_stream_info(avfc, NULL);
Now we're going to set up the decoder, the AVFormatContext will give us
access to all the AVStream components and for each one of them, we can
get their AVCodec and create the particular AVCodecContext and finally
we can open the given codec so we can proceed to the decoding process.
The AVCodecContext holds data about media configuration such as bit rate,
frame rate, sample rate, channels, height, and many others.
We need to prepare the output media file for transmuxing as well, we first
allocate memory for the output AVFormatContext . We create each
stream in the output format. In order to pack the stream properly, we copy
the codec parameters from the decoder.
22/27
avformat_alloc_output_context2(&encoder_avfc, NULL, NULL,
out_filename);
We're getting the AVPacket 's from the decoder, adjusting the timestamps,
and write the packet properly to the output file. Even though the function
av_interleaved_write_frame says "write frame" we are storing the
packet. We finish the transmuxing process by writing the stream trailer to
the file.
av_write_trailer(encoder_avfc);
Transcoding
The previous section showed a simple transmuxer program, now we're going
to add the capability to encode files, specifically we're going to enable it to
transcode videos from h264 to h265 .
After we prepared the decoder but before we arrange the output media file
we're going to set up the encoder.
23/27
AVRational input_framerate = av_guess_frame_rate(decoder_avfc,
decoder_video_avs, NULL);
AVStream *video_avs = avformat_new_stream(encoder_avfc, NULL);
We need to expand our decoding loop for the video stream transcoding:
24/27
AVFrame *input_frame = av_frame_alloc();
AVPacket *input_packet = av_packet_alloc();
// used function
int encode(AVFormatContext *avfc, AVStream *dec_video_avs, AVStream
*enc_video_avs, AVCodecContext video_avcc int index) {
AVPacket *output_packet = av_packet_alloc();
int response = avcodec_send_frame(video_avcc, input_frame);
output_packet->stream_index = index;
output_packet->duration = enc_video_avs->time_base.den /
enc_video_avs->time_base.num / dec_video_avs->avg_frame_rate.num *
dec_video_avs->avg_frame_rate.den;
av_packet_rescale_ts(output_packet, dec_video_avs->time_base,
enc_video_avs->time_base);
response = av_interleaved_write_frame(avfc, output_packet);
}
av_packet_unref(output_packet);
av_packet_free(&output_packet);
return 0;
}
25/27
/*
* H264 -> H265
* Audio -> remuxed (untouched)
* MP4 - MP4
*/
StreamingParams sp = {0};
sp.copy_audio = 1;
sp.copy_video = 0;
sp.video_codec = "libx265";
sp.codec_priv_key = "x265-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0";
/*
* H264 -> H264 (fixed gop)
* Audio -> remuxed (untouched)
* MP4 - MP4
*/
StreamingParams sp = {0};
sp.copy_audio = 1;
sp.copy_video = 0;
sp.video_codec = "libx264";
sp.codec_priv_key = "x264-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-
cfr=1";
/*
* H264 -> H264 (fixed gop)
* Audio -> remuxed (untouched)
* MP4 - fragmented MP4
*/
StreamingParams sp = {0};
sp.copy_audio = 1;
sp.copy_video = 0;
sp.video_codec = "libx264";
sp.codec_priv_key = "x264-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-
cfr=1";
sp.muxer_opt_key = "movflags";
sp.muxer_opt_value = "frag_keyframe+empty_moov+default_base_moof";
/*
* H264 -> H264 (fixed gop)
* Audio -> AAC
* MP4 - MPEG-TS
*/
StreamingParams sp = {0};
sp.copy_audio = 0;
sp.copy_video = 0;
sp.video_codec = "libx264";
sp.codec_priv_key = "x264-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-
cfr=1";
sp.audio_codec = "aac";
sp.output_extension = ".ts";
/* WIP :P -> it's not playing on VLC, the final bit rate is huge
* H264 -> VP9
* Audio -> Vorbis
* MP4 - WebM
26/27
*/
//StreamingParams sp = {0};
//sp.copy_audio = 0;
//sp.copy_video = 0;
//sp.video_codec = "libvpx-vp9";
//sp.audio_codec = "libvorbis";
//sp.output_extension = ".webm";
Now, to be honest, this was harder than I thought it'd be and I had to dig into
the FFmpeg command line source code and test it a lot and I think I'm
missing something because I had to enforce force-cfr for the h264 to
work and I'm still seeing some warning messages like warning messages
(forced frame type (5) at 80 was changed to frame type
(3)) .
27/27