0% found this document useful (0 votes)

47 views27 pages

Leandromoreira Ffmpeg-Libav-Tutorial - 1

Uploaded by

dark side

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views27 pages

Leandromoreira Ffmpeg-Libav-Tutorial - 1

Uploaded by

dark side

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

leandromoreira / ffmpeg-libav-tutorial

github.com/leandromoreira/ffmpeg-libav-tutorial

leandromoreira

I was looking for a tutorial/book that would teach me

license BSD-3-Clause
how to start to use FFmpeg as a library (a.k.a. libav)
and then I found the "How to write a video player in
less than 1k lines" tutorial. Unfortunately it was deprecated, so I decided to
write this one.

Most of the code in here will be in C but don't worry: you can easily
understand and apply it to your preferred language. FFmpeg libav has lots
of bindings for many languages like python, go and even if your language
doesn't have it, you can still support it through the ffi (here's an example
with Lua).

We'll start with a quick lesson about what is video, audio, codec and
container and then we'll go to a crash course on how to use FFmpeg
command line and finally we'll write code, feel free to skip directly tothe
section Learn FFmpeg libav the Hard Way.

Some people used to say that the Internet video streaming is the future of
the traditional TV, in any case, the FFmpeg is something that is worth
studying.

Table of Contents

Intro

video - what you see!

If you have a sequence series of images and change them at a given
frequency (let's say 24 images per second), you will create an illusion of
movement. In summary this is the very basic idea behind a video: a series
of pictures / frames running at a given rate.

Zeitgenössische Illustration (1886)

1/27
audio - what you listen!
Although a muted video can express a variety of feelings, adding sound to it
brings more pleasure to the experience.

Sound is the vibration that propagates as a wave of pressure, through the air
or any other transmission medium, such as a gas, liquid or solid.

In a digital audio system, a microphone converts sound to an analog

electrical signal, then an analog-to-digital converter (ADC) — typically using
pulse-code modulation (PCM) - converts the analog signal into a digital
signal.

2/27
Source

codec - shrinking data

CODEC is an electronic circuit or software that compresses or

decompresses digital audio/video. It converts raw (uncompressed) digital
audio/video to a compressed format or vice versa.
https://en.wikipedia.org/wiki/Video_codec

But if we chose to pack millions of images in a single file and called it a

movie, we might end up with a huge file. Let's do the math:

Suppose we are creating a video with a resolution of 1080 x 1920 (height

x width) and that we'll spend 3 bytes per pixel (the minimal point at a
screen) to encode the color (or 24 bit color, what gives us 16,777,216
different colors) and this video runs at 24 frames per second and it is
30 minutes long.

toppf = 1080 * 1920 //total_of_pixels_per_frame

cpp = 3 //cost_per_pixel
tis = 30 * 60 //time_in_seconds
fps = 24 //frames_per_second

required_storage = tis * fps * toppf * cpp

This video would require approximately 250.28GB of storage or

1.11Gbps of bandwidth! That's why we need to use a CODEC.

container - a comfy place for audio and video

A container or wrapper format is a metafile format whose specification

describes how different elements of data and metadata coexist in a computer
file. https://en.wikipedia.org/wiki/Digital_container_format

A single file that contains all the streams (mostly the audio and video)
and it also provides synchronization and general metadata, such as
title, resolution and etc.

3/27
Usually we can infer the format of a file by looking at its extension: for
instance a video.webm is probably a video using the container webm.

FFmpeg - command line

A complete, cross-platform solution to record, convert and stream audio and

video.

To work with multimedia we can use the AMAZING tool/library called

FFmpeg. Chances are you already know/use it directly or indirectly (do you
use Chrome?).

It has a command line program called ffmpeg , a very simple yet powerful
binary. For instance, you can convert from mp4 to the container avi just
by typing the follow command:

$ ffmpeg -i input.mp4 output.avi

We just made a remuxing here, which is converting from one container to

another one. Technically FFmpeg could also be doing a transcoding but we'll
talk about that later.

FFmpeg command line tool 101

FFmpeg does have a documentation that does a great job of explaining how
it works.

To make things short, the FFmpeg command line program expects the
following argument format to perform its actions ffmpeg {1} {2} -i {3}
{4} {5} , where:

1. global options
2. input file options
3. input url

4/27
4. output file options
5. output url

The parts 2, 3, 4 and 5 can be as many as you need. It's easier to understand
this argument format in action:

# WARNING: this file is around 300MB

$ wget -O bunny_1080p_60fps.mp4
http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_108

$ ffmpeg \
-y \ # global options
-c:a libfdk_aac -c:v libx264 \ # input options
-i bunny_1080p_60fps.mp4 \ # input url
-c:v libvpx-vp9 -c:a libvorbis \ # output options
bunny_1080p_60fps_vp9.webm # output url

This command takes an input file mp4 containing two streams (an audio
encoded with aac CODEC and a video encoded using h264 CODEC) and
convert it to webm , changing its audio and video CODECs too.

We could simplify the command above but then be aware that FFmpeg will
adopt or guess the default values for you. For instance when you just type
ffmpeg -i input.avi output.mp4 what audio/video CODEC does it use
to produce the output.mp4 ?

Werner Robitza wrote a must read/execute tutorial about encoding and

editing with FFmpeg.

Common video operations

While working with audio/video we usually do a set of tasks with the media.

Transcoding

What? the act of converting one of the streams (audio or video) from one
CODEC to another one.

5/27
Why? sometimes some devices (TVs, smartphones, console and etc) doesn't
support X but Y and newer CODECs provide better compression rate.

How? converting an H264 (AVC) video to an H265 (HEVC).

$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-c:v libx265 \
bunny_1080p_60fps_h265.mp4

Transmuxing

What? the act of converting from one format (container) to another one.

Why? sometimes some devices (TVs, smartphones, console and etc) doesn't
support X but Y and sometimes newer containers provide modern required
features.

How? converting a mp4 to a webm .

$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-c copy \ # just saying to ffmpeg to skip encoding
bunny_1080p_60fps.webm

Transrating

6/27
What? the act of changing the bit rate, or producing other renditions.

Why? people will try to watch your video in a 2G (edge) connection using
a less powerful smartphone or in a fiber Internet connection on their 4K
TVs therefore you should offer more than one rendition of the same video
with different bit rate.

How? producing a rendition with bit rate between 3856K and 2000K.

$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-minrate 964K -maxrate 3856K -bufsize 2000K \
bunny_1080p_60fps_transrating_964_3856.mp4

Usually we'll be using transrating with transsizing. Werner Robitza wrote

another must read/execute series of posts about FFmpeg rate control.

Transsizing

What? the act of converting from one resolution to another one. As said
before transsizing is often used with transrating.

Why? reasons are about the same as for the transrating.

7/27
How? converting a 1080p to a 480p resolution.

$ ffmpeg \
-i bunny_1080p_60fps.mp4 \
-vf scale=480:-1 \
bunny_1080p_60fps_transsizing_480.mp4

Bonus Round: Adaptive Streaming

What? the act of producing many resolutions (bit rates) and split the media
into chunks and serve them via http.

Why? to provide a flexible media that can be watched on a low end

smartphone or on a 4K TV, it's also easy to scale and deploy but it can add
latency.

How? creating an adaptive WebM using DASH.

8/27
# video streams
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 160x90 -b:v 250k
-keyint_min 150 -g 150 -an -f webm -dash 1 video_160x90_250k.webm

$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 320x180 -b:v 500k

-keyint_min 150 -g 150 -an -f webm -dash 1 video_320x180_500k.webm

$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 640x360 -b:v 750k

-keyint_min 150 -g 150 -an -f webm -dash 1 video_640x360_750k.webm

$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 640x360 -b:v

1000k -keyint_min 150 -g 150 -an -f webm -dash 1
video_640x360_1000k.webm

$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 1280x720 -b:v

1500k -keyint_min 150 -g 150 -an -f webm -dash 1
video_1280x720_1500k.webm

# audio streams
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:a libvorbis -b:a 128k -vn -f webm
-dash 1 audio_128k.webm

# the DASH manifest

$ ffmpeg \
-f webm_dash_manifest -i video_160x90_250k.webm \
-f webm_dash_manifest -i video_320x180_500k.webm \
-f webm_dash_manifest -i video_640x360_750k.webm \
-f webm_dash_manifest -i video_640x360_1000k.webm \
-f webm_dash_manifest -i video_1280x720_500k.webm \
-f webm_dash_manifest -i audio_128k.webm \
-c copy -map 0 -map 1 -map 2 -map 3 -map 4 -map 5 \
-f webm_dash_manifest \
-adaptation_sets "id=0,streams=0,1,2,3,4 id=1,streams=5" \
manifest.mpd

PS: I stole this example from the Instructions to playback Adaptive WebM
using DASH

Going beyond
There are many and many other usages for FFmpeg. I use it in conjunction
with iMovie to produce/edit some videos for YouTube and you can certainly
use it professionally.

Learn FFmpeg libav the Hard Way

Don't you wonder sometimes 'bout sound and vision? David Robert Jones

Since the FFmpeg is so useful as a command line tool to do essential tasks

over the media files, how can we use it in our programs?

FFmpeg is composed by several libraries that can be integrated into our own
programs. Usually, when you install FFmpeg, it installs automatically all
these libraries. I'll be referring to the set of these libraries as FFmpeg

9/27
libav.

This title is a homage to Zed Shaw's series Learn X the Hard Way,
particularly his book Learn C the Hard Way.

Chapter 0 - The infamous hello world

This hello world actually won't show the message "hello world" in the
terminal Instead we're going to print out information about the video,
things like its format (container), duration, resolution, audio channels and,
in the end, we'll decode some frames and save them as image files.

FFmpeg libav architecture

But before we start to code, let's learn how FFmpeg libav architecture
works and how its components communicate with others.

Here's a diagram of the process of decoding a video:

You'll first need to load your media file into a component called
AVFormatContext (the video container is also known as format). It actually
doesn't fully load the whole file: it often only reads the header.

Once we loaded the minimal header of our container, we can access its
streams (think of them as a rudimentary audio and video data). Each stream
will be available in a component called AVStream.

Stream is a fancy name for a continuous flow of data.

Suppose our video has two streams: an audio encoded with AAC CODEC
and a video encoded with H264 (AVC) CODEC. From each stream we can
extract pieces (slices) of data called packets that will be loaded into
components named AVPacket.

10/27
The data inside the packets are still coded (compressed) and in order
to decode the packets, we need to pass them to a specific AVCodec.

The AVCodec will decode them into AVFrame and finally, this component
gives us the uncompressed frame. Noticed that the same
terminology/process is used either by audio and video stream.

Requirements

Since some people were facing issues while compiling or running the
examples we're going to use Docker as our development/runner
environment, we'll also use the big buck bunny video so if you don't have
it locally just run the command make fetch_small_bunny_video .

Chapter 0 - code walkthrough

TLDR; show me the code and execution.

$ make run_hello

We'll skip some details, but don't worry: the source code is available at
github.

We're going to allocate memory to the component AVFormatContext that

will hold information about the format (container).

AVFormatContext *pFormatContext = avformat_alloc_context();

Now we're going to open the file and read its header and fill the
AVFormatContext with minimal information about the format (notice that
usually the codecs are not opened). The function used to do this is
avformat_open_input. It expects an AVFormatContext , a filename
and two optional arguments: the AVInputFormat (if you pass NULL ,
FFmpeg will guess the format) and the AVDictionary (which are the
options to the demuxer).

avformat_open_input(&pFormatContext, filename, NULL, NULL);

We can print the format name and the media duration:

printf("Format %s, duration %lld us", pFormatContext->iformat-

>long_name, pFormatContext->duration);

To access the streams , we need to read data from the media. The function
avformat_find_stream_info does that. Now, the pFormatContext-
>nb_streams will hold the amount of streams and the pFormatContext-
>streams[i] will give us the i stream (an AVStream).

avformat_find_stream_info(pFormatContext, NULL);

11/27
Now we'll loop through all the streams.

for (int i = 0; i < pFormatContext->nb_streams; i++)

{
//
}

For each stream, we're going to keep the AVCodecParameters, which

describes the properties of a codec used by the stream i .

AVCodecParameters *pLocalCodecParameters = pFormatContext->streams[i]-

>codecpar;

With the codec properties we can look up the proper CODEC querying the
function avcodec_find_decoder and find the registered decoder for the
codec id and return an AVCodec, the component that knows how to enCOde
and DECode the stream.

AVCodec *pLocalCodec = avcodec_find_decoder(pLocalCodecParameters-

>codec_id);

Now we can print information about the codecs.

// specific for video and audio

if (pLocalCodecParameters->codec_type == AVMEDIA_TYPE_VIDEO) {
printf("Video Codec: resolution %d x %d", pLocalCodecParameters-
>width, pLocalCodecParameters->height);
} else if (pLocalCodecParameters->codec_type == AVMEDIA_TYPE_AUDIO) {
printf("Audio Codec: %d channels, sample rate %d",
pLocalCodecParameters->channels, pLocalCodecParameters->sample_rate);
}
// general
printf("\tCodec %s ID %d bit_rate %lld", pLocalCodec->long_name,
pLocalCodec->id, pLocalCodecParameters->bit_rate);

With the codec, we can allocate memory for the AVCodecContext, which will
hold the context for our decode/encode process, but then we need to fill this
codec context with CODEC parameters; we do that with
avcodec_parameters_to_context.

Once we filled the codec context, we need to open the codec. We call the
function avcodec_open2 and then we can use it.

AVCodecContext *pCodecContext = avcodec_alloc_context3(pCodec);

avcodec_parameters_to_context(pCodecContext, pCodecParameters);
avcodec_open2(pCodecContext, pCodec, NULL);

Now we're going to read the packets from the stream and decode them into
frames but first, we need to allocate memory for both components, the
AVPacket and AVFrame.

AVPacket *pPacket = av_packet_alloc();

AVFrame *pFrame = av_frame_alloc();

12/27
Let's feed our packets from the streams with the function av_read_frame
while it has packets.

while (av_read_frame(pFormatContext, pPacket) >= 0) {

//...
}

Let's send the raw data packet (compressed frame) to the decoder,
through the codec context, using the function avcodec_send_packet.

avcodec_send_packet(pCodecContext, pPacket);

And let's receive the raw data frame (uncompressed frame) from the
decoder, through the same codec context, using the function
avcodec_receive_frame.

avcodec_receive_frame(pCodecContext, pFrame);

We can print the frame number, the PTS, DTS, frame type and etc.

printf(
"Frame %c (%d) pts %d dts %d key_frame %d [coded_picture_number
%d, display_picture_number %d]",
av_get_picture_type_char(pFrame->pict_type),
pCodecContext->frame_number,
pFrame->pts,
pFrame->pkt_dts,
pFrame->key_frame,
pFrame->coded_picture_number,
pFrame->display_picture_number
);

Finally we can save our decoded frame into a simple gray image. The
process is very simple, we'll use the pFrame->data where the index is
related to the planes Y, Cb and Cr, we just picked 0 (Y) to save our gray
image.

save_gray_frame(pFrame->data[0], pFrame->linesize[0], pFrame->width,

pFrame->height, frame_filename);

static void save_gray_frame(unsigned char *buf, int wrap, int xsize,

int ysize, char *filename)
{
FILE *f;
int i;
f = fopen(filename,"w");
// writing the minimal required header for a pgm file format
// portable graymap format ->
https://en.wikipedia.org/wiki/Netpbm_format#PGM_example
fprintf(f, "P5\n%d %d\n%d\n", xsize, ysize, 255);

// writing line by line

for (i = 0; i < ysize; i++)
fwrite(buf + i * wrap, 1, xsize, f);
fclose(f);
}

13/27
And voilà! Now we have a gray scale image with 2MB:

Chapter 1 - syncing audio and video

Be the player - a young JS developer writing a new MSE video player.

Before we move to code a transcoding example let's talk about timing, or

how a video player knows the right time to play a frame.

In the last example, we saved some frames that can be seen here:

When we're designing a video player we need

to play each frame at a given pace,
otherwise it would be hard to pleasantly see
the video either because it's playing so fast or
so slow.

Therefore we need to introduce some logic to

play each frame smoothly. For that matter,
each frame has a presentation timestamp
(PTS) which is an increasing number factored
in a timebase that is a rational number
(where the denominator is known as
timescale) divisible by the frame rate
(fps).

It's easier to understand when we look at

some examples, let's simulate some
scenarios.

For a fps=60/1 and timebase=1/60000

each PTS will increase timescale / fps =
1000 therefore the PTS real time for each frame could be (supposing it

14/27
started at 0):

frame=0, PTS = 0, PTS_TIME = 0

frame=1, PTS = 1000, PTS_TIME =
PTS * timebase = 0.016
frame=2, PTS = 2000, PTS_TIME =
PTS * timebase = 0.033

For almost the same scenario but with a

timebase equal to 1/60 .

frame=0, PTS = 0, PTS_TIME = 0

frame=1, PTS = 1, PTS_TIME = PTS
* timebase = 0.016
frame=2, PTS = 2, PTS_TIME =
PTS * timebase = 0.033
frame=3, PTS = 3, PTS_TIME =
PTS * timebase = 0.050

For a fps=25/1 and timebase=1/75 each

PTS will increase timescale / fps = 3
and the PTS time could be:

frame=0, PTS = 0, PTS_TIME = 0

frame=1, PTS = 3, PTS_TIME = PTS * timebase = 0.04
frame=2, PTS = 6, PTS_TIME = PTS * timebase = 0.08
frame=3, PTS = 9, PTS_TIME = PTS * timebase = 0.12
...
frame=24, PTS = 72, PTS_TIME = PTS * timebase = 0.96
...
frame=4064, PTS = 12192, PTS_TIME = PTS * timebase =
162.56

Now with the pts_time we can find a way to render this synched with
audio pts_time or with a system clock. The FFmpeg libav provides these
info through its API:

fps = AVStream->avg_frame_rate
tbr = AVStream->r_frame_rate
tbn = AVStream->time_base

Just out of curiosity, the frames we saved were sent in a DTS order (frames:
1,6,4,2,3,5) but played at a PTS order (frames: 1,2,3,4,5). Also, notice how
cheap are B-Frames in comparison to P or I-Frames.

15/27
LOG: AVStream->r_frame_rate 60/1
LOG: AVStream->time_base 1/60000
...
LOG: Frame 1 (type=I, size=153797 bytes) pts 6000 key_frame 1 [DTS 0]
LOG: Frame 2 (type=B, size=8117 bytes) pts 7000 key_frame 0 [DTS 3]
LOG: Frame 3 (type=B, size=8226 bytes) pts 8000 key_frame 0 [DTS 4]
LOG: Frame 4 (type=B, size=17699 bytes) pts 9000 key_frame 0 [DTS 2]
LOG: Frame 5 (type=B, size=6253 bytes) pts 10000 key_frame 0 [DTS 5]
LOG: Frame 6 (type=P, size=34992 bytes) pts 11000 key_frame 0 [DTS 1]

Chapter 2 - remuxing
Remuxing is the act of changing from one format (container) to another, for
instance, we can change a MPEG-4 video to a MPEG-TS one without much
pain using FFmpeg:

ffmpeg input.mp4 -c copy output.ts

It'll demux the mp4 but it won't decode or encode it ( -c copy ) and in the
end, it'll mux it into a mpegts file. If you don't provide the format -f the
ffmpeg will try to guess it based on the file's extension.

The general usage of FFmpeg or the libav follows a pattern/architecture or

workflow:

protocol layer - it accepts an input (a file for instance but it

could be a rtmp or HTTP input as well)
format layer - it demuxes its content, revealing mostly metadata
and its streams
codec layer - it decodes its compressed streams data optional
pixel layer - it can also apply some filters to the raw frames (like
resizing)optional
and then it does the reverse path
codec layer - it encodes (or re-encodes or even transcodes )
the raw framesoptional
format layer - it muxes (or remuxes ) the raw streams (the
compressed data)
protocol layer - and finally the muxed data is sent to an output
(another file or maybe a network remote server)

16/27
This graph is strongly inspired by Leixiaohua's and Slhck's works.

Now let's code an example using libav to provide the same effect as in
ffmpeg input.mp4 -c copy output.ts .

We're going to read from an input ( input_format_context ) and change it

to another output ( output_format_context ).

AVFormatContext *input_format_context = NULL;

AVFormatContext *output_format_context = NULL;

We start doing the usually allocate memory and open the input format. For
this specific case, we're going to open an input file and allocate memory for
an output file.

if ((ret = avformat_open_input(&input_format_context, in_filename,

NULL, NULL)) < 0) {
fprintf(stderr, "Could not open input file '%s'", in_filename);
goto end;
}
if ((ret = avformat_find_stream_info(input_format_context, NULL)) < 0)
{
fprintf(stderr, "Failed to retrieve input stream information");
goto end;
}

avformat_alloc_output_context2(&output_format_context, NULL, NULL,

out_filename);
if (!output_format_context) {
fprintf(stderr, "Could not create output context\n");
ret = AVERROR_UNKNOWN;
goto end;
}

17/27
We're going to remux only the video, audio and subtitle types of streams so
we're holding what streams we'll be using into an array of indexes.

number_of_streams = input_format_context->nb_streams;
streams_list = av_mallocz_array(number_of_streams,
sizeof(*streams_list));

Just after we allocated the required memory, we're going to loop throughout
all the streams and for each one we need to create new out stream into our
output format context, using the avformat_new_stream function. Notice
that we're marking all the streams that aren't video, audio or subtitle so we
can skip them after.

for (i = 0; i < input_format_context->nb_streams; i++) {

AVStream *out_stream;
AVStream *in_stream = input_format_context->streams[i];
AVCodecParameters *in_codecpar = in_stream->codecpar;
if (in_codecpar->codec_type != AVMEDIA_TYPE_AUDIO &&
in_codecpar->codec_type != AVMEDIA_TYPE_VIDEO &&
in_codecpar->codec_type != AVMEDIA_TYPE_SUBTITLE) {
streams_list[i] = -1;
continue;
}
streams_list[i] = stream_index++;
out_stream = avformat_new_stream(output_format_context, NULL);
if (!out_stream) {
fprintf(stderr, "Failed allocating output stream\n");
ret = AVERROR_UNKNOWN;
goto end;
}
ret = avcodec_parameters_copy(out_stream->codecpar, in_codecpar);
if (ret < 0) {
fprintf(stderr, "Failed to copy codec parameters\n");
goto end;
}
}

Now we can create the output file.

if (!(output_format_context->oformat->flags & AVFMT_NOFILE)) {

ret = avio_open(&output_format_context->pb, out_filename,
AVIO_FLAG_WRITE);
if (ret < 0) {
fprintf(stderr, "Could not open output file '%s'", out_filename);
goto end;
}
}

ret = avformat_write_header(output_format_context, NULL);

if (ret < 0) {
fprintf(stderr, "Error occurred when opening output file\n");
goto end;
}

18/27
After that, we can copy the streams, packet by packet, from our input to our
output streams. We'll loop while it has packets ( av_read_frame ), for each
packet we need to re-calculate the PTS and DTS to finally write it
( av_interleaved_write_frame ) to our output format context.

while (1) {
AVStream *in_stream, *out_stream;
ret = av_read_frame(input_format_context, &packet);
if (ret < 0)
break;
in_stream = input_format_context->streams[packet.stream_index];
if (packet.stream_index >= number_of_streams ||
streams_list[packet.stream_index] < 0) {
av_packet_unref(&packet);
continue;
}
packet.stream_index = streams_list[packet.stream_index];
out_stream = output_format_context->streams[packet.stream_index];
/* copy packet */
packet.pts = av_rescale_q_rnd(packet.pts, in_stream->time_base,
out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
packet.dts = av_rescale_q_rnd(packet.dts, in_stream->time_base,
out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
packet.duration = av_rescale_q(packet.duration, in_stream-
>time_base, out_stream->time_base);
//
https://ffmpeg.org/doxygen/trunk/structAVPacket.html#ab5793d8195cf4789d

packet.pos = -1;

//https://ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga37352ed

ret = av_interleaved_write_frame(output_format_context, &packet);

if (ret < 0) {
fprintf(stderr, "Error muxing packet\n");
break;
}
av_packet_unref(&packet);
}

To finalize we need to write the stream trailer to an output media file with
av_write_trailer function.

av_write_trailer(output_format_context);

Now we're ready to test it and the first test will be a format (video container)
conversion from a MP4 to a MPEG-TS video file. We're basically making the
command line ffmpeg input.mp4 -c copy output.ts with libav.

make run_remuxing_ts

It's working!!! don't you trust me?! you shouldn't, we can check it with
ffprobe :

19/27
ffprobe -i remuxed_small_bunny_1080p_60fps.ts

Input #0, mpegts, from 'remuxed_small_bunny_1080p_60fps.ts':

Duration: 00:00:10.03, start: 0.000000, bitrate: 2751 kb/s
Program 1
Metadata:
service_name : Service01
service_provider: FFmpeg
Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B),
yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr,
90k tbn, 120 tbc
Stream #0:1[0x101]: Audio: ac3 ([129][0][0][0] / 0x0081), 48000
Hz, 5.1(side), fltp, 320 kb/s

To sum up what we did here in a graph, we can revisit our initial idea about
how libav works but showing that we skipped the codec part.

Before we end this chapter I'd like to show an important part of the
remuxing process, you can pass options to the muxer. Let's say we
want to delivery MPEG-DASH format for that matter we need to use
fragmented mp4 (sometimes referred as fmp4 ) instead of MPEG-TS or
plain MPEG-4.

With the command line we can do that easily.

ffmpeg -i non_fragmented.mp4 -movflags

frag_keyframe+empty_moov+default_base_moof fragmented.mp4

Almost equally easy as the command line is the libav version of it, we just
need to pass the options when write the output header, just before the
packets copy.

AVDictionary* opts = NULL;

av_dict_set(&opts, "movflags",
"frag_keyframe+empty_moov+default_base_moof", 0);
ret = avformat_write_header(output_format_context, &opts);

20/27
We now can generate this fragmented mp4 file:

make run_remuxing_fragmented_mp4

But to make sure that I'm not lying to you. You can use the amazing site/tool
gpac/mp4box.js or the site http://mp4parser.com/ to see the differences,
first load up the "common" mp4.

As you can see it has a single mdat

atom/box, this is place where the
video and audio frames are. Now load
the fragmented mp4 to see which how it
spreads the mdat boxes.

Chapter 3 - transcoding

TLDR; show me the code and execution.

$ make run_transcoding

We'll skip some details, but don't worry: the source code is available at
github.

In this chapter, we're going to create a minimalist transcoder, written in C,

that can convert videos coded in H264 to H265 using FFmpeg/libav
library specifically libavcodec, libavformat, and libavutil.

21/27
Just a quick recap: The AVFormatContext is the abstraction for the format
of the media file, aka container (ex: MKV, MP4, Webm, TS). The AVStream
represents each type of data for a given format (ex: audio, video, subtitle,
metadata). The AVPacket is a slice of compressed data obtained from the
AVStream that can be decoded by an AVCodec (ex: av1, h264, vp9, hevc)
generating a raw data called AVFrame.

Transmuxing

Let's start with the simple transmuxing operation and then we can build
upon this code, the first step is to load the input file.

// Allocate an AVFormatContext
avfc = avformat_alloc_context();
// Open an input stream and read the header.
avformat_open_input(avfc, in_filename, NULL, NULL);
// Read packets of a media file to get stream information.
avformat_find_stream_info(avfc, NULL);

Now we're going to set up the decoder, the AVFormatContext will give us
access to all the AVStream components and for each one of them, we can
get their AVCodec and create the particular AVCodecContext and finally
we can open the given codec so we can proceed to the decoding process.

The AVCodecContext holds data about media configuration such as bit rate,
frame rate, sample rate, channels, height, and many others.

for (int i = 0; i < avfc->nb_streams; i++)

{
AVStream *avs = avfc->streams[i];
AVCodec *avc = avcodec_find_decoder(avs->codecpar->codec_id);
AVCodecContext *avcc = avcodec_alloc_context3(*avc);
avcodec_parameters_to_context(*avcc, avs->codecpar);
avcodec_open2(*avcc, *avc, NULL);
}

We need to prepare the output media file for transmuxing as well, we first
allocate memory for the output AVFormatContext . We create each
stream in the output format. In order to pack the stream properly, we copy
the codec parameters from the decoder.

We set the flag AV_CODEC_FLAG_GLOBAL_HEADER which tells the encoder

that it can use the global headers and finally we open the output file for
write and persist the headers.

22/27
avformat_alloc_output_context2(&encoder_avfc, NULL, NULL,
out_filename);

AVStream *avs = avformat_new_stream(encoder_avfc, NULL);

avcodec_parameters_copy(avs->codecpar, decoder_avs->codecpar);

if (encoder_avfc->oformat->flags & AVFMT_GLOBALHEADER)

encoder_avfc->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;

avio_open(&encoder_avfc->pb, encoder->filename, AVIO_FLAG_WRITE);

avformat_write_header(encoder->avfc, &muxer_opts);

We're getting the AVPacket 's from the decoder, adjusting the timestamps,
and write the packet properly to the output file. Even though the function
av_interleaved_write_frame says "write frame" we are storing the
packet. We finish the transmuxing process by writing the stream trailer to
the file.

AVFrame *input_frame = av_frame_alloc();

AVPacket *input_packet = av_packet_alloc();

while (av_read_frame(decoder_avfc, input_packet) >= 0)

{
av_packet_rescale_ts(input_packet, decoder_video_avs->time_base,
encoder_video_avs->time_base);
av_interleaved_write_frame(*avfc, input_packet) < 0));
}

av_write_trailer(encoder_avfc);

Transcoding

The previous section showed a simple transmuxer program, now we're going
to add the capability to encode files, specifically we're going to enable it to
transcode videos from h264 to h265 .

After we prepared the decoder but before we arrange the output media file
we're going to set up the encoder.

Create the video AVStream in the encoder, avformat_new_stream

Use the AVCodec called libx265 ,
avcodec_find_encoder_by_name
Create the AVCodecContext based in the created codec,
avcodec_alloc_context3
Set up basic attributes for the transcoding session, and
Open the codec and copy parameters from the context to the stream.
avcodec_open2 and avcodec_parameters_from_context

23/27
AVRational input_framerate = av_guess_frame_rate(decoder_avfc,
decoder_video_avs, NULL);
AVStream *video_avs = avformat_new_stream(encoder_avfc, NULL);

char *codec_name = "libx265";

char *codec_priv_key = "x265-params";
// we're going to use internal options for the x265
// it disables the scene change detection and fix then
// GOP on 60 frames.
char *codec_priv_value = "keyint=60:min-keyint=60:scenecut=0";

AVCodec *video_avc = avcodec_find_encoder_by_name(codec_name);

AVCodecContext *video_avcc = avcodec_alloc_context3(video_avc);
// encoder codec params
av_opt_set(sc->video_avcc->priv_data, codec_priv_key,
codec_priv_value, 0);
video_avcc->height = decoder_ctx->height;
video_avcc->width = decoder_ctx->width;
video_avcc->pix_fmt = video_avc->pix_fmts[0];
// control rate
video_avcc->bit_rate = 2 * 1000 * 1000;
video_avcc->rc_buffer_size = 4 * 1000 * 1000;
video_avcc->rc_max_rate = 2 * 1000 * 1000;
video_avcc->rc_min_rate = 2.5 * 1000 * 1000;
// time base
video_avcc->time_base = av_inv_q(input_framerate);
video_avs->time_base = sc->video_avcc->time_base;

avcodec_open2(sc->video_avcc, sc->video_avc, NULL);

avcodec_parameters_from_context(sc->video_avs->codecpar, sc-
>video_avcc);

We need to expand our decoding loop for the video stream transcoding:

Send the empty AVPacket to the decoder, avcodec_send_packet

Receive the uncompressed AVFrame , avcodec_receive_frame
Start to transcode this raw frame,
Send the raw frame, avcodec_send_frame
Receive the compressed, based on our codec, AVPacket ,
avcodec_receive_packet
Set up the timestamp, and av_packet_rescale_ts
Write it to the output file. av_interleaved_write_frame

24/27
AVFrame *input_frame = av_frame_alloc();
AVPacket *input_packet = av_packet_alloc();

while (av_read_frame(decoder_avfc, input_packet) >= 0)

{
int response = avcodec_send_packet(decoder_video_avcc,
input_packet);
while (response >= 0) {
response = avcodec_receive_frame(decoder_video_avcc, input_frame);
if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
break;
} else if (response < 0) {
return response;
}
if (response >= 0) {
encode(encoder_avfc, decoder_video_avs, encoder_video_avs,
decoder_video_avcc, input_packet->stream_index);
}
av_frame_unref(input_frame);
}
av_packet_unref(input_packet);
}
av_write_trailer(encoder_avfc);

// used function
int encode(AVFormatContext *avfc, AVStream *dec_video_avs, AVStream
*enc_video_avs, AVCodecContext video_avcc int index) {
AVPacket *output_packet = av_packet_alloc();
int response = avcodec_send_frame(video_avcc, input_frame);

while (response >= 0) {

response = avcodec_receive_packet(video_avcc, output_packet);
if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
break;
} else if (response < 0) {
return -1;
}

output_packet->stream_index = index;
output_packet->duration = enc_video_avs->time_base.den /
enc_video_avs->time_base.num / dec_video_avs->avg_frame_rate.num *
dec_video_avs->avg_frame_rate.den;

av_packet_rescale_ts(output_packet, dec_video_avs->time_base,
enc_video_avs->time_base);
response = av_interleaved_write_frame(avfc, output_packet);
}
av_packet_unref(output_packet);
av_packet_free(&output_packet);
return 0;
}

We converted the media stream from h264 to h265 , as expected the

h265 version of the media file is smaller than the h264 however the
created program is capable of:

25/27
/*
* H264 -> H265
* Audio -> remuxed (untouched)
* MP4 - MP4
*/
StreamingParams sp = {0};
sp.copy_audio = 1;
sp.copy_video = 0;
sp.video_codec = "libx265";
sp.codec_priv_key = "x265-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0";

/*
* H264 -> H264 (fixed gop)
* Audio -> remuxed (untouched)
* MP4 - MP4
*/
StreamingParams sp = {0};
sp.copy_audio = 1;
sp.copy_video = 0;
sp.video_codec = "libx264";
sp.codec_priv_key = "x264-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-
cfr=1";

/*
* H264 -> H264 (fixed gop)
* Audio -> remuxed (untouched)
* MP4 - fragmented MP4
*/
StreamingParams sp = {0};
sp.copy_audio = 1;
sp.copy_video = 0;
sp.video_codec = "libx264";
sp.codec_priv_key = "x264-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-
cfr=1";
sp.muxer_opt_key = "movflags";
sp.muxer_opt_value = "frag_keyframe+empty_moov+default_base_moof";

/*
* H264 -> H264 (fixed gop)
* Audio -> AAC
* MP4 - MPEG-TS
*/
StreamingParams sp = {0};
sp.copy_audio = 0;
sp.copy_video = 0;
sp.video_codec = "libx264";
sp.codec_priv_key = "x264-params";
sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-
cfr=1";
sp.audio_codec = "aac";
sp.output_extension = ".ts";

/* WIP :P -> it's not playing on VLC, the final bit rate is huge
* H264 -> VP9
* Audio -> Vorbis
* MP4 - WebM

26/27
*/
//StreamingParams sp = {0};
//sp.copy_audio = 0;
//sp.copy_video = 0;
//sp.video_codec = "libvpx-vp9";
//sp.audio_codec = "libvorbis";
//sp.output_extension = ".webm";

Now, to be honest, this was harder than I thought it'd be and I had to dig into
the FFmpeg command line source code and test it a lot and I think I'm
missing something because I had to enforce force-cfr for the h264 to
work and I'm still seeing some warning messages like warning messages
(forced frame type (5) at 80 was changed to frame type
(3)) .

27/27

FFmpeg Encoding and Editing Course PDF
100% (1)
FFmpeg Encoding and Editing Course PDF
56 pages
Video Formats Guide
100% (1)
Video Formats Guide
15 pages
Lesson 3
100% (5)
Lesson 3
5 pages
FFmpeg Basics
100% (3)
FFmpeg Basics
216 pages
Ffmpeg Notes
No ratings yet
Ffmpeg Notes
3 pages
WIT-Color Ultra 9000 High Definition Printer Operations Manual
100% (1)
WIT-Color Ultra 9000 High Definition Printer Operations Manual
95 pages
CTR-12 - FPSO Firenze - Clarification Report - Ph-1 Presv Items
100% (1)
CTR-12 - FPSO Firenze - Clarification Report - Ph-1 Presv Items
3 pages
Ffmpeg
No ratings yet
Ffmpeg
13 pages
Employee Performance Review - Quarterly - Final
No ratings yet
Employee Performance Review - Quarterly - Final
5 pages
20 Ffmpeg Commands For Beginners: 1. Getting Audio/Video File Information
100% (1)
20 Ffmpeg Commands For Beginners: 1. Getting Audio/Video File Information
9 pages
FFmpeg Book Edit
No ratings yet
FFmpeg Book Edit
97 pages
C202 How To Build Your Own Cloud Encoder With FFmpeg Final
No ratings yet
C202 How To Build Your Own Cloud Encoder With FFmpeg Final
75 pages
D.N.jha - Rethinking Hindu Identity-Routledge (2014)
100% (1)
D.N.jha - Rethinking Hindu Identity-Routledge (2014)
111 pages
Production Requirements Checklist: Course Number and Name: Production Title: Prod. # Producer: Director
No ratings yet
Production Requirements Checklist: Course Number and Name: Production Title: Prod. # Producer: Director
2 pages
Why Triple Offset The Benefits of Triple Offset Butterfly Valves
100% (2)
Why Triple Offset The Benefits of Triple Offset Butterfly Valves
2 pages
FFMPEG Book
0% (2)
FFMPEG Book
12 pages
Module 3 User's Guide - Planning and Assessing Health Worker Activities
No ratings yet
Module 3 User's Guide - Planning and Assessing Health Worker Activities
149 pages
FFmpeg Batch Quick Guide
No ratings yet
FFmpeg Batch Quick Guide
25 pages
Fundamentals of Multi-Channel: Encoding For Streaming
No ratings yet
Fundamentals of Multi-Channel: Encoding For Streaming
13 pages
Web Video Codec Guide Web Media Technologies MDN
No ratings yet
Web Video Codec Guide Web Media Technologies MDN
27 pages
Man Ffmpeg
No ratings yet
Man Ffmpeg
39 pages
Transcoding 101
No ratings yet
Transcoding 101
8 pages
An Ffmpeg and SDL Tutorial
0% (1)
An Ffmpeg and SDL Tutorial
46 pages
Ffmpeg: Presented by Pooja Mishra
No ratings yet
Ffmpeg: Presented by Pooja Mishra
20 pages
2.0-Widevin DRM Encoding and Packaging
No ratings yet
2.0-Widevin DRM Encoding and Packaging
27 pages
Video Compression MPEG
No ratings yet
Video Compression MPEG
25 pages
Media Networks - Audio and Video
No ratings yet
Media Networks - Audio and Video
45 pages
Ffmpeg Comandos
No ratings yet
Ffmpeg Comandos
12 pages
Tutorial FFMPEG
No ratings yet
Tutorial FFMPEG
11 pages
AFS Pro700 Brochure AFS-8018-10
No ratings yet
AFS Pro700 Brochure AFS-8018-10
2 pages
x264 Options Explained (Avidemux)
No ratings yet
x264 Options Explained (Avidemux)
1 page
Ffmpeg Audio Video Manipulation
No ratings yet
Ffmpeg Audio Video Manipulation
12 pages
Video Formats Guide
No ratings yet
Video Formats Guide
15 pages
Video Analyis Part
No ratings yet
Video Analyis Part
4 pages
FFmpeg Batch User Guide en
No ratings yet
FFmpeg Batch User Guide en
34 pages
Info
No ratings yet
Info
2 pages
Ultrasonic Sensor UB 0-GM 75 - 5: Dimensions
No ratings yet
Ultrasonic Sensor UB 0-GM 75 - 5: Dimensions
5 pages
5-Encoders and Containers
No ratings yet
5-Encoders and Containers
5 pages
Ffmpeg Bitstream Filters
No ratings yet
Ffmpeg Bitstream Filters
7 pages
20+ FFmpeg Commands For Beginners - OSTechNix
No ratings yet
20+ FFmpeg Commands For Beginners - OSTechNix
25 pages
FFMPEG
No ratings yet
FFMPEG
1 page
Ffmpeg Arch Guide
No ratings yet
Ffmpeg Arch Guide
9 pages
ScreenFlow Concepts: Easy Video Editing for Professional Screencasts
From Everand
ScreenFlow Concepts: Easy Video Editing for Professional Screencasts
Jose John
5/5 (1)
ZQ200 User Manual V2.2
No ratings yet
ZQ200 User Manual V2.2
20 pages
FFMPEG PDF Sequence 1
No ratings yet
FFMPEG PDF Sequence 1
15 pages
VideoFormat 2023 10
No ratings yet
VideoFormat 2023 10
4 pages
What Is A Video Encoder and Decoder
No ratings yet
What Is A Video Encoder and Decoder
4 pages
Blog Video Transcoding
No ratings yet
Blog Video Transcoding
10 pages
Blog Video Transcoding
No ratings yet
Blog Video Transcoding
10 pages
FFMPEG
No ratings yet
FFMPEG
5 pages
Configuration of FFmpeg For High Stability During Encoding
No ratings yet
Configuration of FFmpeg For High Stability During Encoding
38 pages
A Quick Guide To Using FFmpeg To Convert Media Files
No ratings yet
A Quick Guide To Using FFmpeg To Convert Media Files
11 pages
Ffmpeg Documentation
No ratings yet
Ffmpeg Documentation
30 pages
Codecs Table Bible
No ratings yet
Codecs Table Bible
1 page
Axiom User Guide
No ratings yet
Axiom User Guide
7 pages
Ffmpeg
No ratings yet
Ffmpeg
44 pages
PT - Inspirasi Kreasi Sejahtera: Mulai Project P.I.C Minggu 1 Minggu 2 Minggu 3
No ratings yet
PT - Inspirasi Kreasi Sejahtera: Mulai Project P.I.C Minggu 1 Minggu 2 Minggu 3
1 page
For The Rest of Us: Kush Amerasinghe
No ratings yet
For The Rest of Us: Kush Amerasinghe
26 pages
Digital Video Formats: Audio Video Interleave (.Avi)
No ratings yet
Digital Video Formats: Audio Video Interleave (.Avi)
4 pages
Fazal Mahmood - Resume
No ratings yet
Fazal Mahmood - Resume
1 page
Ffmpeg Cheat Sheet: by Via
No ratings yet
Ffmpeg Cheat Sheet: by Via
3 pages
Ffmpeg Recipes
No ratings yet
Ffmpeg Recipes
11 pages
A Guide To Video and Audio Conversion Using FFmpeg
No ratings yet
A Guide To Video and Audio Conversion Using FFmpeg
11 pages
Streaming With Ffmpeg and Ffserver
No ratings yet
Streaming With Ffmpeg and Ffserver
4 pages
Distributed Computing Question Bank
No ratings yet
Distributed Computing Question Bank
6 pages
11.2 The Process of Cell Division
No ratings yet
11.2 The Process of Cell Division
36 pages
Raspberry Pi For Beginners: How to get the most out of your raspberry pi, including raspberry pi basics, tips and tricks, raspberry pi projects, and more!
From Everand
Raspberry Pi For Beginners: How to get the most out of your raspberry pi, including raspberry pi basics, tips and tricks, raspberry pi projects, and more!
Matthew Oates
No ratings yet
H264
No ratings yet
H264
6 pages
Hacks..
From Everand
Hacks..
Hunter Davis
No ratings yet
Synopsis PPT 4
No ratings yet
Synopsis PPT 4
7 pages
Support Vector Machine For EEG Signal
No ratings yet
Support Vector Machine For EEG Signal
4 pages
Glossary of Terms
No ratings yet
Glossary of Terms
4 pages
CSTP 2 Completed UPDATED
No ratings yet
CSTP 2 Completed UPDATED
11 pages
Cambridge IGCSE: PHYSICS 0625/41
No ratings yet
Cambridge IGCSE: PHYSICS 0625/41
16 pages
Chapman - From Their POV
No ratings yet
Chapman - From Their POV
18 pages
Free & Opensource Video Editor Software For Windows, Ubuntu Linux & Macintosh
From Everand
Free & Opensource Video Editor Software For Windows, Ubuntu Linux & Macintosh
Cyber Jannah Studio
No ratings yet
Free Video Editor Software Untuk Windows, Mac Dan Linux Edisi Bahasa Inggris
From Everand
Free Video Editor Software Untuk Windows, Mac Dan Linux Edisi Bahasa Inggris
Cyber Jannah Studio
No ratings yet
Thoits 1994 StressorsProblemSolvingIndividual
No ratings yet
Thoits 1994 StressorsProblemSolvingIndividual
19 pages
Lab Report Writing Guidelines: AP Chemistry ASK
No ratings yet
Lab Report Writing Guidelines: AP Chemistry ASK
13 pages
Abhishek Dhiman
No ratings yet
Abhishek Dhiman
3 pages
APA 7 Referencing Sources Examples August 2021 v1.0
No ratings yet
APA 7 Referencing Sources Examples August 2021 v1.0
67 pages
Sample Diagnostic
No ratings yet
Sample Diagnostic
29 pages
Mini Project Assessment Brief Oct 24 - RH Signed
No ratings yet
Mini Project Assessment Brief Oct 24 - RH Signed
8 pages
Week4 EnhancedSystemDecomposition Part2
No ratings yet
Week4 EnhancedSystemDecomposition Part2
22 pages
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
No ratings yet
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
6 pages
Mendels Law of Segregation
No ratings yet
Mendels Law of Segregation
10 pages
Oop Programming: Maulana Azad National Urdu University
No ratings yet
Oop Programming: Maulana Azad National Urdu University
5 pages
Analog Dialogue, Volume 47, Number 4
From Everand
Analog Dialogue, Volume 47, Number 4
Analog Dialogue
No ratings yet
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
From Everand
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
Fouad Sabry
No ratings yet
Video Creators 48 Top Tools: Video Editing Special Edition [ The 8 series - Vol 9 ]
From Everand
Video Creators 48 Top Tools: Video Editing Special Edition [ The 8 series - Vol 9 ]
Mobile Library
No ratings yet