Ffmpeg Raspi
Ffmpeg Raspi
Electronics
Bachelor’s Thesis
14 May 2020
Abstract
The main objective of this project was to create a program for Raspberry Pi that shows live
video feed on a display as a form of surveillance using an IP (Internet Protocol) camera.
The goal was to try to create a simple program using the portable Raspberry Pi and estab-
lish a connection with an IP camera over the local network and be able to acquire live vid-
eo stream and deduce how secure it is.
This project consists of a Raspberry Pi 3B+ model as the main central processing device.
The other main device involved in this project is a TP-Link NC450 IP camera which will be
used to record the live video streaming sessions. Any screen which supports VGA or
HDMI can be used for the display. The Pi model is powered by a Broadcom BCM2837B0
quad core A53 processor which is 64 bit. It consists of GB Ethernet port and a wifi chip for
connecting to a network. The TP-Link NC450 camera is a wireless HD PTZ camera with a
75 degrees lens view. The camera device has an Ethernet port for wired connection and
built in wifi for wireless network connectivity. The camera uses H.264 AVC video codec
standard.
The main connection established between the camera and Pi is done mainly by RTSP.
The camera has its own RTSP stream link which allows it to connect to RTSP servers
online. An important software library, FFmpeg, is installed in the terminal of the Pi. This
package acts as a H.264 video encoder/decoder and compresses the AVC format from the
TP-Link NC450. The encoded format can then be used in a RTSP stream in a python
script being run in the Pi. The script runs the stream and is able to display the live video
feed.
The project resulted in a successful display of live video stream using RTSP. The stream
was able to run up till native 30 fps as that is the maximum supported frame rate by the
camera. There was video latency of less than a second in the display. The overall project
can be employed as a means of private indoor surveillance system. Using the Pi as a
headless system, the video stream can be viewed remotely.
1 Introduction 1
2 Theoretical Overview 2
2.1 Linux 2
2.1.1 Introduction to Linux 2
2.1.2 Linux vs MacOS vs Windows 3
2.1.3 Raspbian 4
2.1.4 V4L 5
2.1.5 OpenCV 6
2.1.6 FFmpeg 6
2.2 Introduction to Raspberry Pi 7
2.2.1 What is Raspberry Pi 7
2.2.2 Python Programming 8
2.3 IP Camera 9
2.3.1 IP Cameras and Video Codec Standards 10
2.3.2 RTSP 14
2.3.3 How RTSP Communicates With IP Camera 18
2.3.4 SIP 20
2.3.5 ONVIF 21
2.3.6 Bonjour 23
2.3.7 Security 23
3.1 Raspberry Pi 24
3.2 TP-Link NC450 25
3.3 Motion Eye OS 25
3.4 Scripting and RTSP Stream Testing 26
4 Results 27
5 Conclusion 30
References 31
Appendices
Appendix 1. First code
Appendix 2. Second script (VLC)
List of Abbreviation
Pi Raspberry Pi
PS Program Stream
Py Python
SD Secure Digital
TS Transport Stream
V4L Video4Linux
1
1 Introduction
As the world has been progressing and moving forward with growing modern
technologies and innovations, at the same time security has always remained a big
threat and consideration. Many ideas and practices have been introduced over the
years to improve security. The term security is not limited to one entity, its meaning is
vast and can be described and explained into a lot of branches. However, in this
particular project we will be focusing on security as an evidence of physical and/or
potential risks on private grounds and security as a measure taken by installation to
ensure safety from hostile influences. When it comes to monitoring by installation or
deployment, there are several methods which have been devised over the years. A
good example can be a watch dog hired for security measures to look after a private
property. Although it is an effective way to monitor surroundings, like many things it too
has its pros and cons. This project considers the introduction of camera surveillance
systems. Depending on the nature of the enviroment being monitored, a camera can
be installed strategically. It can be mounted and drilled on walls/ceiling, or just simply
placed on a table top counter or a shelf. The main idea is for the camera to record or
live stream footage and send it to a central system which is then viewed on a screen
display. In the olden times, wired setup was common and coaxial cables were used to
tie the wires up. Now with advanced systems, wirless tech has been integrated and we
can say goodbye to old fashioned wired connections. There are many different types of
camers used in surveillance systems. Some of these include box, bullet, dome,
day/night, thermal FLIR(Forward Looking Infrared) and IP(Internet Protocol) cameras
respectively.
This project has opted to use a wireless PTZ (Pan Tilt Zoom) IP camera for recording
the live stream and Raspberry Pi as our main system which will then send the live
footage to a HD (High Definition) display. The main system being used is a Pi 3B+.
This whole setup is very cost effective as Pis are cheaper and convenient to use, being
pocket sized. The system will be run on Linux OS and will imply the use of a python
sript which will be able to display the live stream on a display connected via
HDMI/VGA. The IP camera being used is a H.264 wireless camera from TP-Link, it
also supports nightvision.
2
2 Theoretical Overview
2.1 Linux
Linux is an operating system platform affiliated with Unix-like systems. These systems
are based on Linux Kernel which is furthermore an operating system which was re-
leased in September 17, 1991 by Linus Torvalds. They are usually packaged as Linux
distributions. What originally started off for desktop computers soon was being used
widely in many other platforms. Now it is the only operating system being used in
TOP500 supercomputers. [1.]
Linux can be best described as an OS which runs on packages and distributions which
are under open source licenses. It was written in C language. Debian, Ubuntu and
Fedora are some of the widely used distributions which are used personally and
commercially as well. It was initially developed on intel x86 platform but now has been
based on a vast amount of different platforms. The UNIX like OS, in other words a
Linux Kernel, is mainly responsible for a lot of important tasks for the Linux system.
Some of these include handling file systems, accessibility to hardware and peripherals,
processing control and networking. Drivers can be separately added to the Linux
system via adding them as modules. Same way separate libraries and other software
programs can be added as modules. The user interface for commonly used desktop
systems employ the use of CLI(Command Line Interface) and GUI(Graphical User
Interface) with GUI being the default. CLIs are commonly used via the terminal
emulator which uses text for input and output to control the many various tasks and
installations for the Linux kernel system. When it comes to programming, distributions
on Linux support many programming languages thanks to GCC( GNU Compiler
Collection). GCC allows languages Ada, C, C++, Go and Fortran to be compiled.
However, many other languages like Php, Python, Java, Ruby etc also are readily
available for Linux distributions. This is possible due to cross platform reference
implementation support these languages provide for Linux. [1.]
3
The three main operating system used throughout the globe are Windows, Mac OS
and Linux Os respectively. The three very rightly differ from each other and each OS
has its own pros and cons. Both, Mac OS and Windows started off as GUIs, whereas
Linux was first designed for GNU developers. Windows is the most widely used OS in
the globe and accounts for 90% usage with Mac OS leading at second with 7% and
Linux distributions a mere 1% [2]. Because of the fact that both Windows and Mac OS
have a huge user base, they are most prone to malware and spyware too. Linux on the
other hand has a very low probability of catching malware. Mac OS is the most costly
to use as Mac OS users are forced to purchase a Mac system built by Apple Inc.
Compared to Windows and Mac OS, Linux has a different file structure. The code base
is completely different. The drives are all stored over a single tree file. The command
prompt for windows, also known as Windows Command Processor, is used to execute
commands and run other administrative functions. For Mac OS, it has a terminal inter-
face which is used to run commands and explore directories. Just like Mac OS, Linux
also uses terminal to run commands, explore directories, and install packages and oth-
er administrative functions. Figure 1 below explains the differences in registry between
the three operating systems. [3.]
Figure 1: Registry comparison between the three operating systems. Reprinted from
[3.]
4
For some users, having interchangeable interfaces could be a key factor in determining
which OS to opt with. Having easily switchable GUIs means users can easily run multi-
ple programs simultaneously without having much knowledge of programming. It also
provides visual feedback and makes it easy to learn and operate the system. The fig-
ure 2 below contrasts the differences between interchangeable interfaces which would
benefit the users [3].
2.1.3 Raspbian
respectively and later on the Pi3+ model too. The latest version of Raspbian, Buster,
was released at the same time as the latest Raspberry Pi model 4. It is due noted that
all these versions of Raspbian are still actively running and under constant develop-
ment.
With the high optimization capabilities of this OS, it makes it very suitable for low per-
formance ARM Cortex CPU based Raspberry Pis. Raspbian employs the use of PIXEL
(Pi Improved X- Window Environment Lightweight) for the desktop environment. It
comprises of a latest LXDE desktop platform. [4.] The idea is for the Raspberry Pi us-
ers to be able to enjoy a desktop feel environment on their Pis which have low pro-
cessing speed and memory. Figure 3 below shows the Raspbian desktop of Pi 3B+
used in this project.
2.1.4 V4L
V4L which stands for Video4Linux is a set of computer programs (device drivers) which
are written in C language and are used for real-time video capture for Linux. It also
functions as an API (Application Programming Interface) for Linux. It has a support
feature for most IP cameras and other TV/radio related device interfaces. Programmers
are able to add video capture support to the applications they are using which are able
to run on V4L armature. V4L was initially brought up for 2.1.x cycle of Linux Kernel with
added support in the form of V4L1. Later a newer upgraded version known as V4L2
6
was introduced for 2.5.x Linux Kernel which smashed some design bugs. [5.] The latter
includes a compatibility mode for V4L1 apps. Each IP camera has its own URI (Uniform
Resource Identifier) which helps V4L drivers to identify the particular device.
http://username:[email protected]:8080/path/file?action=stream is the usual
representation of a URI of a device. Some of the software programs which are sup-
ported by V4L include Zoneminder, FFmpeg, VLC media player, libav, OpenCV,
Skype, Motion and many more.
2.1.5 OpenCV
OpenCV (Open Source Computer Vision Library), which was developed by Intel in
1999, can be best described as an open source software library used mainly for ma-
chine learning and computer vision. It is natively written in C/C++ language. OpenCV is
compatible with Windows, MacOS, Linux, FreeBSD, NetBSD and OpenBSD. OpenCV
contains more than 2500 algorithms which include both machine learning and comput-
er vision algorithms. C++ is the primary interface for most of the algorithms in the soft-
ware library although the API for binding interfaces of other programming languages
like Java, Python and MATLAB can be found at the OpenCV official website. The APIs
are available in the online documentation tab. [7.] Having grown popular over the
years, its wide range of applications includes facial recognition system, motion detec-
tion system, image processing, mobile robotics, gesture recognition, video surveillance
system, image stitching, object identification, augmented reality etc.
2.1.6 FFmpeg
two more command line tools, ffplay and ffprobe. Ffplay is a media player which em-
ploys the use of FFmpeg libraries. Ffprobe is a multimedia analyzer which displays
media information. [8.] When utils support for v4l is added to FFmpeg, the input device
option –use_libv4l2 is made available for usage. The v4l-utils support can be created
by applying the –enable-libv4l2 option. When a v4l2 device is connected to the system
bearing FFmpeg, it will be identified and created as a file device node by Linux. The
device which can be an IP camera or any other video surveillance device will exist in
the format /dev/videoN. N can be any number assigned automatically to the device
(usually 0). For testing the frames the v4l2 device can provide and the widthxheight
size it supports, the list option –list_formats all can be used. Different devices support
different standards and video codecs hence to find out all the standards, the list option
–list_standards all can be applied. Listings 1 and 2 below are examples of v4l devices
being used with FFmpeg. [15.]
in many digital projects. The raspberry Pi model being used for this project is the model
B+ from the Raspberry Pi3 family. It employs the use of a 64-bit quad-core 1.4 GHz
ARM Cortex-A53 processor. The Pi 3B+ board has four USB 2.0 ports along with an
ethernet port. For audio output, a 3.5mm jack is available. Composite video jack, a
HDMI port and a DSI(Display Serial Interface) for LCD panels are supported. The
model 3B+ has a 40 pin GPIO. There is no need for a separate wifi dongle as
wifi/bluetooth is already integrated. Figure 4 below shows the Pi used in this project.
Python is a programming language first made public in 1991. Python has a standard
library and programmers would notice that python’s main programming paradigm in-
9
if n < 0:
raise ValueError('You must enter a positive number')
fact = 1
i = 2
while i <= n:
fact = fact * i
i += 1
print(fact)
Figure 5 shows the Thonny IDE python environment in Pi 3B+ used in the project.
2.3 IP Camera
IP camera which stands for Internet Protocol camera is a type of digital camera which
receives data and then sends it over a local LAN. As the name implies, an IP camera
10
receives data and sends it in the form of image or video using the internet. When IP
cameras were relatively new, they were connected to the network via a wired connec-
tion like the Ethernet. Newer more improved wireless IP cameras are widely used now.
Major uses include for monitoring and/or for security reasons and surveillance systems.
IP cameras can be used privately in private households and properties or commercially
on a large scale in industries and important organizations. Previous generations of
cameras used older TV formats like NTSC, PAL and SECAM. Now the formats range
from HD, 1080p Full HD, 4K Ultra HD and 16:9 formats. One of the main reasons why
IP cameras have become common in households is how easy the installation process
is. Users can easily mount cameras on walls/ceilings/roof or simply place them on top
of flat surfaces and shelves. A big advantage of using wifi IP cameras is that users can
easily access the live video feed and control their camera/cameras on their mobile de-
vices. There are mobile applications relative to the camera in use through which users
can control the camera given that they are connected to the local network via wifi or
Ethernet.
As briefly mentioned in the introduction, there are several different types of IP cameras.
They range from dome cameras, bullet cameras, box cameras, POE cameras and PTZ
cameras etc. Some cameras are manufactured for outdoor use while others are best
suited for indoor surveillance. The installation process may differ for each type of cam-
era. Also it is due noted that before purchasing an IP camera, there are some things to
take into consideration such as the resolution of the camera, lens of the IP camera,
video recording system of the camera, dynamic range and low light capability. One of
the major differences in different IP cameras can be identified by the audio/video
standards they actually support. With newly developed IP cameras, they are bound to
support the latest audio/video compression standard. Therefore it is important to un-
derstand and compare the respective audio/video standards supported by IP cameras.
MPEG-2
MPEG-2 is best defined as a digital audio/video compression standard used for storage
or transmission of digital video and the associated audio format. Its equivalent is the
H.222/H.262 as per ITU (International Telecommunication Union). It is important to
11
note that MPEG-2 is not to be confused with MP2, the audio layer two of MPEG-1.
MPEG-2 was introduced by MPEG (Moving Pictures Expert Group) as the second
standard. MPEG-2 has standards which come under the (ISO/IEC 13818). There are
eleven parts with each part covering the specifications of MPEG-2. Parts 1, 2, 3 and 7
are explained as follows: [9.]
Part 1 is the Systems section of the MPEG-2 and is also known as H.222 as per
the ITU. This section describes two container formats known as TS (Transport
Streams) and PS (Program Streams). TS are a digital media container format
which is mainly used for storage and transmission of audio and video data files.
TS are responsible for carrying less reliable streams. This transmission is em-
ployed by satellite broadcast and broadcast systems like IPTV (Internet Proto-
col Television), DVB (Digital Video Broadcasting) and HDV (High Definition
Video). TS use .ts, .tsv and .tsa as filename extensions. TS use M2TS for Blue-
Ray disks and HDV. The other container format, PS, is responsible for more re-
liable random access storage such as flash memory and hard disk drives. PS
are used for multiplexing digital audio and video. PS use .mpg, .mpeg, .m2p
and .ps filename extensions which are then extended to VOB, MOD and EVO.
PS are commonly used for DVD- Video and HD DVD disks. [9.]
The second part of MPEG-2 standards is the video section also referred as
H.262 (ITU- T Rec). It is very similar to the previous MPEG-1 standard. It has
support for interlaced video which means for a certain video display, it can dou-
ble the received frame rates without using extra bandwidth. This way the video
contains two fields of a frame captured consecutively. This makes it ideal for
analog broadcast televisions as flicker is reduced too. MPEG-2 video decoders
are downwards compatible and devices with MPEG-2 video decoders can also
play MPEG-1 video samples. [9.]
The audio section is the part 3 of the standard. It allows multichannel audio up
to 5.1 channels. It is also downward compatible. This allows audio to be com-
pressed and MPEG-1 decoders can decode and the result is that stereo audio
tracks can be played. [9.]
Part 7 which is also known as AAC (Advanced Audio Tracking) is a distinct au-
dio format. Unlike the part 3 audio section, AAC is not downwards compatible.
However, with support up to 48 channels at a higher sampling rate, multilingual
and multi program capabilities, AAC proves to be more efficient than its prede-
cessors. [9.]
12
H.263
H.263 is a video codec which was designed for communication over video, video con-
ferencing. It was designed to operate at a low sampling bit-rate. It is a successor to the
previous standards H.261 and H.262 and therefore was designed with improvements. It
consists of three versions to date. The first version released in 1995 replaced the
H.261. It was further improved into version H.263v2 and H.263v3 respectively. H.263
was also used as a basis for MPEG-4 part 2. H.263 can be used for bidirectional visual
communication. Because it was designed for lower bit-rate sampling video conferenc-
ing, operating at more than 50kbps data rate may disrupt performance of the system
and low end computers. It has a support for FFmpeg which means that libraries like
libavcodec can be used to decompress the H.263 video format thus it is used by media
players like VLC and MPlayer. Moreover, H.263 is also used by internet application
giants like Youtube, MySpace, Google and Google Video for their flash videos. H.263
also sub sequentially saw its use in protocols like RTP (Real Time Transport Protocol),
RTSP (Real Time Streaming Protocol) and SIP (Session Initiation Protocol). Compar-
ing with H.261, H.263 provides a better video quality. [10.]
H.264
H.264 is the part 10 of MPEG-4 standard started in 2003. It was developed jointly by
ITU-T and ISO/IEC. It is the most popular and widely used video codec. It is commonly
referred to as AVC (Advanced Video Coding). It is mainly used for recording, com-
pressing and distribution of HD video content. The main aim for the development of this
video codec was to enhance the video compression capabilities at even lower bit-rate
compared to older H.263 and H.261 video codecs. This was made possible by the ad-
13
dition of the DCT (Discrete Cosine Transform) integer. Flexibility was another achieve-
ment for the H.264 as it intended on offering higher/lower sampling bit rates, high-
er/lower video resolution, DVD storage and be able to be employed by a vast amount
of internet applications and online video streaming services. As previously mentioned,
it is one of the three formats which are supported by Blue-Ray disks and it is the most
widely one used to do so. Popular streaming services such as Netflix, Hulu, Amazon
Prime Video, Youtube and ITunes Store all greatly use H.264. In the CCTV video sur-
veillance market, many CCTV cameras and IP cameras have included the H.264 for-
mat for the latest products. H.264 has mainly three so called profiles, main, extended
and baseline profiles. Main profile is used for broadcasting like HDTV. Extended profile
is used for video streaming purposes. Baseline profile is employed for services such as
video conferencing. [11.] Comparing H.264 with H.263, the latter was designed to op-
erate at low bit rates only whereas H.264 has the option for encoding both high and low
bit rate videos. H.264 also allows lower overall cost at a much more efficient video
yield. Compared with its predecessor, H.264 requires lower bandwidth, lower storage
for video compression and lower download durations. Although H.264 has had a major
improvement in the video compression and streaming industry, it still struggles for UHD
(Ultra High Definition) content. It consumes more bandwidth while providing lower fps
(frames per second). To counter the issues, an ever better and more enhanced H.265
video codec has been designed.
H.265
H.265 or MPEG-H part 2 also commonly referred to as HEVC (High Efficiency Video
Coding) is a standard of video compression. It is a successor to the widely used and
popular H.264. Its design further enhances the video compression capabilities of the
H.264. This meant that H.265 was able to operate at even lower bit sampling rates
while yielding much better video quality. It offers support up to 8k UHD and is able to
compress 4k UHD video content on approximately 10 Mbps. Comparing with AVC,
HEVC is able to better compress data from 25% to 50%. Similar to AVC, HVEC also
employs the use of DCT but with the addition of DST (Discrete Sine Transform). [12.]
14
The working principle is same as that of the H.264. Both operate in such a way as by
identifying redundant areas in a video sample by comparison between different parts of
a frame. This process takes place between single and consecutive frames. Instead of
replacing redundant places with original pixels, they are replaced by short description.
Now the major difference that separates HVEC from AVC is the fact that HVEC uses
pixel sizes from 16x16 to 64x64 for pattern comparison. Other changes also include
enhanced motion vector prediction and motion compensation filtering. [12] However,
H.265 as a whole is very costly compared to the cheaper H.264. AVC is still the most
widely used video compression standard followed by HVEC. Comparing the three
H.263, H.264 and H.265, as the numbers in ascending order suggest, the increasing
number of video codec is more efficient than the previous one making HVEC the most
efficient out of all the others. Example of H.263 device is shown in figure 6 below [16].
2.3.2 RTSP
RTSP which stands for Real Time Streaming Protocol is an application level data trans-
fer protocol used to transfer multimedia streams over the network directly. As the name
implies, the media being transferred and stream is done in real-time. The protocol es-
tablishes communication between the two endpoints, the client and the server. Multi-
15
media streams which can be audio and video can be transported either from client to a
server (voice/video recording) or server to client (video on demand). RTSP servers do
not stream media directly. Usually the RTSP servers employ the use of other data
transfer protocols such as RTP (Real Time Transport Protocol) and RTCP (Real Time
Control Protocol). The media being transferred can be in the form of saved clips or live
video feeds. [13.] RTSP can be considered similar to HTTP as both use TCP (Trans-
mission Control Protocol) for maintaining end to end connection. However, unlike
HTTP where no session information is retained by the client, RTSP uses an identifier
which labels a session for a maintained server. During a session, the protocol can open
and close many other transport connections to deliver RTSP requests. Where RTP and
TCP are mainly used for streaming and multimedia transport, UDP (User Datagram
Protocol) is also used for data transfer and is connectionless. [13; 14.]
RTSP mainly supports three operations. These include a proposition of a media server
to an existing conference. With this possibility, a media server can be requested to be
added into a conference. It can then either record a part of or all of the conference or
play back the media content for the existing conference. This makes it very ideal for
online teaching purposes and business meetings. The protocol can also be able to add
media to a live presentation. The server can inform the client about any additional me-
dia being added to the presentation. A common operation the RTSP follows is the
fetching of media from a server. The client may fetch a description from the server us-
ing HTTP or other means. In the case of multimedia streams, the description will con-
tain port numbers and multicast addresses. [14.] There are several important basic
RTSP requests which indicate what method to be used on the source. These are iden-
tified by the Request-URI. All of these requests are case sensitive. They are mentioned
and explained as follows:
OPTIONS request will return all and any other request methods that the server
can accept. An OPTIONS request can be made at any time. OPTIONS request
can be issued in both directions, from client to server and vice versa. It is a re-
quired request for the connection. [14.]
PLAY request will allow the media stream to be played after the SETUP request
has been successful. PLAY requests can be edited using range and will play
the stream in the specified range. As soon as the range is completed, the play-
back is automatically paused. If no range is specified, PLAY request will contin-
ue to play media from beginning to end. This request can also allow queuing
which means that if a PLAY request is active and another PLAY request is is-
sued, it will play the second request after the first one has been completed. This
method is required and operates in client to server direction. [14.]
PAUSE request as the name suggests will immediately stop or pause the play-
back of a media stream. It can stop playback of one or more streams depending
on the number of streams being active. If a PAUSE request is used on a media
stream which is named, it will stop the equivalent stream. Media stream can re-
sume playback with the PLAY request after being paused. For audio, PAUSE
request will cause muting. Range can be specified in the request thus the
stream will be halted in the specified range. This request method is recom-
mended. [14.]
ANNOUNCE method is used for two things. When the client sends the request
to server, the request posts the presentation description to the server. The re-
quest URL recognizes the media object and that is also displayed by the AN-
NOUNCE request. The second task for the request is to update the session de-
scription when server sends it to client. This update process happens in real
time. This method is optional. [14.]
For a specific URI, the TEARDOWN request will terminate the transmission of
media stream. Session identifiers relative to their respected sessions will no
longer be valid and in order to initiate a new session, a SETUP request with the
required parameters will have to be issued. TEARDOWN request can only be
sent from client to server and it is a required method. [14.]
All of the above requests in RTSP can be sent from client to servers. All of the methods
operate on presentation and stream objects with the exception of SETUP, which can
only operate on stream. RTSP has brought about a major step forward in the compu-
ting industry. Before RTSP, media over the internet had to be saved and downloaded
in order to be viewed. RTSP made it possible for multimedia streams to be accessed
over the network. There are a lot of reasons to explain why this protocol has grown to
be successful and so widely implemented now. The protocol is extendable meaning
newer methods and parameters can be added. Due to its similarity with HTTP, it can
be parsed by HTTP parsers. RTSP is transport independent and can use either unreli-
able UDP or reliable TCP for transporting media streams. RTSP has ability to handle
multiple servers at one time. Several ongoing sessions can be established with servers
by the RTSP client. Furthermore RTSP can also manage VCR (Video Cassette Re-
cording) devices and those devices that allow recording or playback only. As men-
tioned earlier, RTSP is useful for online communication and services mainly video con-
ferencing, online lecturing and can be used to allow digital editing on media streams
remotely. RTSP has control over servers in such a way that they are able to play and
stop a server independently. [14.]
Every media stream or IP camera devices have their own RTSP URLs. In this context,
a presentation description can be explained as a set of media streams being controlled.
The IP camera has its own RTSP URL which will allow RTSP clients to identify it. It is
important to understand that every presentation and media stream has a presentation
description file. This file defines the contents and properties of the IP camera’s media
stream and basically the overall presentation. The description file can be acquired by
the RTSP client through email or HTTP. It may not be stored on the server. A presenta-
tion may contain more than one media stream, similarly a presentation description can
comprise of more than one presentation. Assuming a presentation description file is
reporting one presentation as it is in this case with IP camera, the file comprises of that
camera’s language being used, encodings, transport method(s) the server can handle
and parameters. These parameters allow the RTSP client to decide which media is
most applicable to choose from. When RTSP URL is identified and stream is being
controlled by RTSP, the server handling is notified by the RTSP URL about the media
stream being managed and any names stored on server by the stream. Audio and
video streams of the camera can be found in different server locations which indicate
19
that media streams can be found on different servers. Apart from the description file
consisting of media parameters, some other factors also need to be determined. These
include the port number of the IP camera and network destination. In the event of live
media streaming, the media server selects the port and multi cast address. [14.] Figure
7 shows the popular VLC media player being used as a RTSP client.
When a RTSP client like VLC media player will send RTSP request to server which is
connected to the IP camera source, it will dictate which request methods such as
PLAY, PAUSE, RECORD etc the server can accept via RTSP. The server will send
these requests back to VLC. The server which is streaming media from IP camera will
then receive a description file request from VLC. The server will then send the descrip-
tion file to VLC which will contain media parameters, language and encodings. VLC will
then issue a SETUP request and server will indicate which transport methods
(UDP/TCP) it can support. After SETUP request is completed successfully, the session
will be started when server sends bit stream to client. Bit stream will be transmitted to
VLC using the transport method the server can manage and that which was specified
in the SETUP request. The whole RTSP session process is shown in figure 8 below.
20
2.3.4 SIP
SIP (Session Initiation Protocol) is communication protocol of the application layer that
is mainly responsible for establishing, maintaining and ending real-time end to end
sessions in IP telephony. The protocol is mainly used in ordinary telephone connec-
tions, video calling, and voice calling over the internet. Instant messaging and text
messaging also employ the use of SIP protocol. It is a text based protocol and shares
many similarities with HTTP and SMTP (Simple Mail Transfer Protocol). Like RTSP,
SIP session can also contain multiple streams of media content. SIP operates with oth-
er protocols such as SDP (Session Description Protocol), UDP, TCP and SCTP
(Stream Control Transmission Protocol). SDP operates in providing media type and
setup of media for SIP. SDP further uses RTP for carrying media streams such as au-
dio and video data between terminals. SIP is able to hand unicast and even multicast
streams. For an ongoing session, SIP can allow changes being made which involve
changes to any port, adding in more recipients and addition or deletion of other
streams. SDP is responsible for specifying the format of the stream, the codec stand-
ard of the stream and communication protocol. In a SIP session, there are SIP user
agents and SIP servers present. User agents can be an IP phone, mobile phones etc
and are endpoints in a SIP session. SIP servers allow other user agents to be located.
SIP has 14 different request methods. Most common among these methods include
21
INVITE, ACK, CANCEL, BYE and REGISTER. A SIP URI is used to indicate a user’s
SIP number. In a SIP call session, there are 4 important steps that occur for a success-
ful connection between 2 users. [18.]
1. Registration process happens when a user instigates their user agent for ex-
ample an IP phone. Request method REGISTER is issued and the phone has
to register to a SIP server. After it has been registered, it is now discoverable by
other user agents. [18.]
2. The second step is the call establishment process. This step explains when one
user agent tries to connect to another and a number of different request meth-
ods are issued. The SIP INVITE request starts the connection process. The
message includes the receiver agent’s SIP URI. It is sent to the SIP server
where it identifies the intended receiver. The second message, SIP Response
100 (Trying), is sent to the receiver user agent by the server for confirmation of
previous INVITE request. After the INVITE request is received successfully to
the receiver user, the message SIP Response 180 (Ringing) shows up which
informs the sender that the receiver user agent is alerting the user of the call.
SIP Response 200 (OK) message is sent back to the sender when the receiver
accepts the call invitation. The final step in establishment process is the ACK
request sent by the sender to receiver after which the VoIP call is commenced
by the receiver. [18.]
3. The actual VoIP call data between both users is being sent by transport proto-
cols such as RTP. VoIP call can contain audio alone or audio and video togeth-
er. [18.]
4. The last step in the session is to end the communication between the two us-
ers. This is achieved by the BYE request message. Any user can issue the SIP
BYE request to terminate the VoIP call. When one user issues a BYE request, a
response 200 OK message is sent to the other user. [18.]
2.3.5 ONVIF
ONVIF (Open Network Video Interface Forum) is an open standard and global forum
which was founded by Bosch Security Systems, Sony and Axis communications in
22
2008. The main objective of ONVIF is providing and assisting of interfaces that support
interoperability characteristics of physical IP-based security systems to effectively be
able to operate with other products or systems in implementation or direct access.
ONVIF provides such standardized interfaces that allow different organizations and
brands to create IP-based security products that are able to work with each other.
ONVIF is an open forum which is open to all organizations and manufacturers. ONVIF
provides more than 12000 conformant products. In order for a product to be compliant
with ONVIF, it must have at least one of the six ONVIF profiles. These profiles help
identify the conformant products and if they are compatible with other products. How-
ever, there are many manufacturers and products that falsely claim to be ONVIF con-
formant. Each profile has distinct features that have to be present in an ONVIF con-
formant device and client. A conformant client that has Profile T will work with a con-
formant device with Profile T. There are 6 ONVIF profiles A, C, G, Q, S and T. Profiles
A and C are used for access control whereas Profiles G, Q, S and T are employed in Ip
video networking products. [17.]
Profile A is used for control configuration. Devices with Profile A can fetch in-
formation and configure credentials and access. A client with Profile A will pro-
vide such credentials and configured access rules. [17.]
Profile C is also used for access control. Like Profile A, devices and clients with
Profile C can support door control access, site information, alarm management
etc. [17.]
Profile G is made for video systems and is mainly responsible for recording con-
trol and storage. A Profile G video device can record video over an IP network
or the conformant device itself. A Profile G client will control the recorded video
data from the Profile G device. [17.]
Profile Q is used for setup and discovery of other conformant devices. A video
networking device bearing Profile Q will be easily discovered by a Profile Q cli-
ent. The client can also control the conformant device over a network. [17.]
Profile S is mainly made for basic video streaming in IP based video devices
and cameras. A Profile S video device can transmit video to a Profile S client
over a network. A Profile S client is able to configure and control the video data
23
being sent from the conformant device. Profile S also includes PTZ controls, re-
lay outputs and audio in for certain devices and clients that carry such features.
[17.]
2.3.6 Bonjour
Bonjour is a protocol created by Apple Inc that allows discovery and searching of other
network devices. Hence it can allow communication between devices connected on the
network. It allows for network devices to be found without the need for configuration of
DNS servers and IP addresses entries. What was originally designed to help locate
Apple devices and products within a single LAN now is also used between Windows
and Apple to help share devices such as printers. Many IT tech firm giants now employ
Bonjour as gateways and protocols that allow different networks to operate. [19.] Upon
logging in to the IP camera’s account being used in this project, it is found that the Bon-
jour name assigned to the camera is NC450 2.0-1ab651. When Bonjour is installed on
Windows and after completing the setup, the respective Bonjour name can be detected
on the Bonjour search list without entering any IP address.
2.3.7 Security
When the products and devices being used are only manufactured so they can protect
your privacy and monitor your safety, the important question always arise that are IP
cameras really secure? Security is an important factor taken into consideration nowa-
days when a consumer is using products just for the sole reason of video monitoring
and surveillance. IP cameras with different capabilities are employed either in private
households or on a larger scale in industries and corporations. It is important to under-
stand that everything which is connected to the network is prone to fall under the wrong
hands of cyber criminals who can infiltrate into online systems if security is not too
24
strong. IP cameras like other network devices come with their fair share of security they
can provide. An IP camera, be it wired or wireless wifi camera, connects to servers
online. It is assigned its own IP address. If the cameras are not set up correctly, it is
very easy for it to fall under the wrong hands. Weak login credentials is very simple to
crack, it can be done simply with a key logger. Moreover a known IP address can be
used to login to a camera’s administration user interface online. There even exists sev-
eral search engines that exposes vulnerable camera devices being connected online.
Therefore it is important to question an IP camera’s security and the measures taken in
order to ensure maximum security. A lot of IP cameras are designed with features that
support two factor authentication. This ensures an extra layer of security. Hackers who
are able to find out passwords will not be able to get through two factor authentication.
The camera which is being used in this project does not support such features. If the
camera supports WPA2 protocol, it will ensure that the data being transferred over the
network is done securely through the wireless router.
3.1 Raspberry Pi
The Raspberry Pi used in this project is the Pi 3B+. It is preinstalled with a wifi chip for
wireless network connectivity. It is powered by a quad core 64 -bit 1.4 GHz processor.
It has a faster Ethernet and PoE (Power over Ethernet) support. The Pi was chosen as
to serve as the main IoT kit for the live video surveillance system. Its main objective is
to run a program using a programmed script that shows live video feed on a display. It
is run under Debian Raspbian, Linux OS therefore all the packages and different soft-
ware libraries being affiliated with this project are purely Linux based. After being set up
and booted for the first time, the SSH (Secure Shell) and VNC (Virtual Network Compu-
ting) were enabled in the Pi configuration settings. This allows the Pi to be controlled
remotely and be used as a headless system eradicating the use of external periph-
erals. In order to use VNC, VNC viewer was installed on Windows. The IP address of
the Pi also has to be determined so VNC can connect to the Pi on Windows. Typing
hostname –I in the terminal window of the Pi reveals the IP address the Pi is assigned.
The Pi is now ready to be used remotely for the next steps.
25
The IP camera being used in this project is a wireless wifi camera from TP-Link Tech-
nologies. It is a PTZ (Pan Tilt Zoom) camera that holds night vision capabilities along
with sound and motion detection. Camera can be easily mounted on the roof or walls. It
can rotate vertically up to 150 degrees and rotate 360 degrees horizontally. The cam-
era is installed with a 1 Megapixel CMOS sensor which is used for imaging and record-
ing at a maximum resolution of 1280x720p at max 30 fps. The camera also supports 2
way-audio which mean the user who is using camera for surveillance can communicate
through the camera’s microphone with others in the area being monitored. Camera has
a micro SD slot which can store up till 32 GB of recorded and saved content. TP-Link
has its own cloud application which allows the camera to be controlled remotely via a
smart phone. The camera can only connect to 2.4 GHz wireless networks. TP-Link
NC450 uses H.264 video compression standard and claims to have ONVIF compli-
ance. Although upon confirming whether it is an ONVIF conformant device, the device
was looked up on the official ONVIF conformant search and there were no results
showing TP-Link products being conformant. Therefore it is not truly ONVIF conform-
ant. This camera which claims to support ONVIF has ONVIF port number 3702. The
RTSP port for this camera is port 554. The Bonjour name for this camera is NC450 2.0-
1ab651. Figure 9 below shows the camera used in this project.
As the name suggests it is itself an OS that replaces the current Linux OS on the Pi. It
has an online web based interface that is operational in any browser. It has support for
most IP cameras including TP-Link NC450. Motion Eye is capable of motion detection
and still images. Alerts can be enabled and the media content can be viewed via FTP
server. This method of viewing live video feed was also attempted using Motion Eye
OS. The steps leading to the results are covered shortly.
The main objective of this project is to achieve the desired live streamed video feed
using a programmed script that the Raspberry Pi will run. For the programming aspect,
Python is chosen to be the most suitable environment to work in. The first step is to
ensure that Python 3 is installed in the terminal window and it is updated to latest ver-
sion. OpenCV, an open source software library, is used for this video surveillance sys-
tem. The API for python is downloaded and installed from the documentation tab in
OpenCV website. Thonny IDE environment is used for programming and writing the
script. After installing OpenCV and with the help of imported libraries, the scripting for
this program is under operation. RTSP is the key role included in the script. Upon re-
searching the camera streaming info, the RTSP stream link for TP-Link NC450 which
works for RTSP streaming is rtsp://<username:password@<camera
ip>:554/h264_hd.sdp. TP-Link confirmed that the port number 554 will allow port for-
warding for this camera. After modifying the imported library, it is ready to run. Listing 4
below displays the main code for streaming live video feed.
import cv2
cap = cv2.VideoCapture('rtsp://admin:[email protected]:554/h264_hd.sdp')
while(cap.isOpened()):
ret, frame = cap.read()
if ret==True:
cv2.imshow('frame',frame)
if cv2.waitKey(1) == 27:
break
else:
break
Listing 4: Main program code for displaying live video stream
The wait key line in the code is included for termination of the session. The number 27
corresponds to the ASCII value for <ESC>. When ESC is pressed, the video session
will terminate instantly. However, after testing the RTSP streams multiple times, there
27
are still errors in generating the live stream. The main reason for this is because the
H.264 format is not being encoded and decoded. This is where FFmpeg comes into
play. It has to be installed on the Pi terminal and the script was finally able to recognize
the camera’s media format. Another method of displaying live video feed in this pro-
gram was incorporated and that was through a RTSP client such as VLC media player.
For that purpose, VLC had to be installed on terminal. After creating a code, the pro-
gram was successful in running and displaying the live video feed in VLC. Listing 5
shows the program code for running live feed on VLC player.
import numpy as np
import cv2
import vlc
player =
vlc.MediaPlayer('rtsp://admin:[email protected]:554/h264_hd.sdp')
player.play()
Listing 5: Program code for displaying video feed in VLC
4 Results
The results obtained were surprisingly desirable. The main program code for generat-
ing RTSP stream was successful after FFmpeg was installed in the Pi. The program
automatically opens the live video stream in a separate window frame after being run. It
takes a couple of seconds for camera to initialize before the stream becomes smoother
and frames become stable. For the first code, the video generated stream has a video
latency of approximately less than or equal to 1 second. An important observation re-
garding video latency was made. When the display is viewed on the laptop display via
VNC, the latency is greater than 1 second and image quality is low. The frames drop in
comparison to when the same stream is being viewed in a HDMI display. Latency is
much lower compared to when video feed is viewed in a VNC connected laptop dis-
play. The figure 10 below shows the obtained result after running the main code.
28
Figure 10: Snap taken of live video feed display using RTSP stream link
Figure 11 below shows the live video feed being streamed on VLC when the second
code is run.
Figure 11: Image snapped of second code showing video feed on VLC
In comparison to the previous results, the latency is greater when video is streamed in
VLC player than the previous code. Video latency is approximately about 2 seconds
and image quality is low too. Frames drop even lower in comparison. There is same
29
delay in audio too in VLC. However, when video feed from latter script is viewed
through HDMI display, it is much smoother than video feed being viewed in the laptop
via VNC.
Video feed display is also achieved using Motion Eye OS for Raspberry Pi. For that
purpose, Motion Eye OS is installed on the SD card to replace Raspbian. Upon installa-
tion, the Pi running as a video surveillance system has to be connected to a wired net-
work for first boot. The Pi displays the ip it has been assigned during the first boot pro-
cess. The IP address is entered in a web browser and login credentials have to be en-
tered. Motion Eye successfully opens on the browser and the camera has to be added
manually. When the RTSP stream link and the camera login credentials are entered,
the camera is detected and starts live streaming. Camera streaming settings and other
preferences can be changed in the settings tab. When the resolution for this camera is
set at 1280x720p and frame rate at 30 fps, resultant video latency is less than a sec-
ond. Image quality is surprisingly good. The live video stream in Motion Eye OS can be
seen in figure 12 below.
5 Conclusion
The main aim of this project was to create a program for Raspberry Pi that will show
live video streaming with the help of an IP camera on a display. Apart from the main
objective, another analysis was to determine whether the less powerful Pi can manage
video streaming capabilities without too much video latency. The goal was to create a
system that allows users to monitor environment or specific areas they wish to do so
remotely using an IP camera. Practical implementation of such a system can range
from surveillance of infants or pets at a private household to larger commercial busi-
nesses and companies. The fully functional video surveillance system was achieved as
planned in several ways. The core of this project revolves around a Pi 3B+ and the TP-
Link NC450 wifi camera. Communication between the two is established wirelessly
using RTSP streaming protocol. The camera using a H.264 video codec standard has
to be encoded and decoded in order for the media data to be transmitted. FFmpeg
along with OpenCV allows the media format to be compressed. Once the camera de-
scription file has right encodings, RTSP client can issue SETUP request to the camera
and later on other method requests to establish a real-time session between the two.
The media content is then transmitted in RTP/RTCP data packets.
The Pi plays a role in acting as an IoT system that runs the program written in python.
The results obtained were as expected and revealed that the Pi can be used as a cost
effective easy to use video surveillance system. With the final program, video latency
resulted to be less than 1 second which is quite acceptable in terms of efficiency and
reliability of this system. When a HDMI display is employed for viewing the live footage,
latency is further deduced. It can be concluded that the existent video latency can be a
result of the Pi’s performance capabilities and processing power. However, the opera-
tional Pi in conjunction with the TP-Link NC450 can raise doubts when it comes to se-
curity. The camera which does not support two- factor authentication can be prone to
cyber breaches. The overall reliability of this particular project can be questioned in
terms of security. These questions can perhaps be solved with more secure and better
replacements to the NC450.
31
References
7. OpenCV [online]
URL: https://opencv.org/
Accessed on March 19, 2020
8. Ffmpeg [online]
URL: https://en.wikipedia.org/wiki/FFmpeg
Accessd on April 4, 2020
32
9. MPEG-2 [online]
URL: https://en.wikipedia.org/wiki/MPEG-2
Accessed on April 15, 2020
13. RTSP[online]
URL: https://en.wikipedia.org/wiki/Real_Time_Streaming_Protocol
Accessed on May 3, 2020
First Code
Appendix 2
1 (1)