0% found this document useful (0 votes)
18 views19 pages

UHD Database Focus On Smart Cities and Smart Trans

Uploaded by

Eyad Barqawi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

UHD Database Focus On Smart Cities and Smart Trans

Uploaded by

Eyad Barqawi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

electronics

Article
UHD Database Focus on Smart Cities and Smart Transport
Lukas Sevcik1, * , Miroslav Uhrina 1, * and Jaroslav Frnda 2

1 Faculty of Electrical Engineering and Information Technology, University of Zilina, Univerzitna 1,


010 26 Zilina, Slovakia
2 Faculty of Operation and Economics of Transport and Communications, University of Zilina, Univerzitna 1,
010 26 Zilina, Slovakia; [email protected]
* Correspondence: [email protected] (L.S.); [email protected] (M.U.)

Abstract: “Smart city” refers to a modern solution to organizing a city’s services, using cloud
technologies to collect and evaluate large amounts of data, including data from camera systems.
Smart city management covers several areas that can be implemented separately, but only their
combination can realize the overall desired smart city function. One of the core areas of smart city
automation is smart city transport. Transportation is a crucial system in any city, and this is why
it needs to be monitored. The primary objective of this publication is to generate top-notch 4K
UHD video sequences that are solely dedicated to showcasing smart cities and their transportation
systems. The resulting comprehensive database will be made accessible to all professionals in the field,
who can utilize it for extensive research purposes. Additionally, all the reference video sequences
will be transcoded into various quality settings by altering critical parameters like the resolution,
compression standard, and bit rate. The ultimate aim is to determine the best combination of video
parameters and their respective settings based on the measured values. This in-depth evaluation will
ensure that each video sequence is of the highest quality and provides an unparalleled experience
for the service providers offering the service. The video sequences captured will be analyzed for
quality assessments in smart cities or smart transport technologies. The database will also include
objective and subjective ratings, along with information about the dynamics determined by spatial
and temporal information. This will enable a comparison of the subjective evaluation of a selected
sample of our respondents with the work of other researchers, who may evaluate it with a different
sample of evaluators. The assumption of our future research is to predict the subjective quality based
Citation: Sevcik, L.; Uhrina, M.; on the type of sequence determined by its dynamicity.
Frnda, J. UHD Database Focus on
Smart Cities and Smart Transport. Keywords: UHD database; smart cities; smart transport; subjective evaluation; VMAF; QoE
Electronics 2024, 13, 904. https://
doi.org/10.3390/electronics13050904

Academic Editors: Aryya


1. Introduction
Gangopadhyay and Stefanos Kollias
It is crucial to have a reliable evaluation of the quality of video services offered
Received: 12 January 2024 online, including those of video-on-demand. This is because such services are rapidly
Revised: 15 February 2024
expanding and gaining widespread popularity, as seen in regard to YouTube, Netflix, and
Accepted: 22 February 2024
other streaming platforms. The technology used for streaming is also evolving, from
Published: 27 February 2024
the traditional non-adaptive form to adaptive streaming. Additionally, internet video
streaming technology is moving from an adapted connection-oriented video transport
protocol, such as the Real-Time Messaging Protocol (RTMP), to adaptive streaming that
Copyright: © 2024 by the authors.
utilizes HTTP.
Licensee MDPI, Basel, Switzerland. The use of surveillance cameras has increased significantly, leading to challenges
This article is an open access article in storing, transmitting, and analyzing video data. As a result, there is a great need
distributed under the terms and for a reliable system to manage large amounts of video data. Such a system should
conditions of the Creative Commons have efficient video compression, stable storage, and high-bandwidth ethernet or internet
Attribution (CC BY) license (https:// transmission capabilities. In modern cities, distributed cameras capture video from various
creativecommons.org/licenses/by/ scenarios. Video sensor systems can provide valuable data for improved traffic planning
4.0/). and management. Future intelligent road technologies will rely heavily on the quality

Electronics 2024, 13, 904. https://doi.org/10.3390/electronics13050904 https://www.mdpi.com/journal/electronics


Electronics 2024, 13, 904 2 of 19

and quantity of such data. Video-based detection systems are an essential component of
intelligent transport systems and offer a flexible means of acquiring data. They are also
evolving rapidly, which makes them highly promising.
We must bear in mind that quality is a crucial factor in many industries. In the video
industry, various databases are designed to assess video quality using objective or subjective
metrics. We have compared various databases produced by different institutions. The Ultra
Video Group [1] shot 16 4K quality video sequences, capturing different spatio-temporal
information and covering all four quadrants. They used well-known sequences, such as
Beauty, Bosphorus, Jockey, or ReadySetGo, as a good basis for testing. The SJTU database [2]
contains ten 4K video sequences, some of which could be used in the smart city monitoring
industry, such as Bund Nightscape, Marathon, Runners, Traffic and Building, and Traffic
Flow. However, this database is not designed specifically for this purpose. In [3], a database
of video sequences suitable for mobile viewing was described, exploring the impact of
adaptive streaming for mobile phones, where the quality can vary depending on the
connection. In [4], a database representing different distortions was created and presented,
taking into account not only temporal and spatial information, but also colorfulness,
blurriness, or contrast.
The authors proposed a database called LIVE-NFLX-II in the paper [5]. This database
contains subjective QoE responses for various design dimensions, such as different bit rate
adaptation algorithms, network conditions, and video content. The content characteristics
cover a wide range, including natural and animated video content, fast and slow scenes,
light and dark scenes, and low and high texture scenes. In [6], the authors proposed and
built a new database, the LIVE-Qualcomm mobile in-capture video quality database, which
contains a total of 208 videos that model six common in-capture distortions. The coding
structure, syntax, various tools, and settings relevant to the coding efficiency have been
described in [7]. In the paper, the perception of compression as well as spatial and temporal
information was further investigated. The authors compiled an extensive database of video
sequences, whose quality was subjectively evaluated.
To enhance the accuracy of video quality prediction in NR, a comprehensive video
quality assessment database was developed in [8]. This database comprises 585 videos
featuring unique content, captured by a diverse group of users with varying levels of
complexity and authentic distortions. The subjective video quality ratings were determined
through crowdsourcing. To effectively analyze and utilize data in areas such as smart cities
and smart traffic, it is crucial to expand the existing databases. This means adding new
sequences that solely contain snapshots of the city, traffic, or other relevant information.

2. Related Work
Quality of Experience (QoE) is highly dependent on QoS parameters, as factors such
as latency, jitter, or packet loss are also important in video traffic. Although such factors
are easily measurable, QoE cannot be easily quantified. Currently, one of the most popular
network services is live video streaming, which is growing at a rapid scale [9,10]. In [11],
a detailed quantitative analysis of video quality degradation in a homogeneous HEVC
video transcoder was presented, along with an analysis of the origin of these degradations
and the impact of the quantization step on the transcoding. The differences between
video transcoding and direct compression were also described. The authors also found
a dependency between the quality degradation caused by transcoding and the bit rate
changes of the transcoded bit rate.
In [12], the authors compared the available 4K video sequences. The compression
standards H.264 (AVC), H.265 (HEVC), and VP9 were compared. Video sequences were
examined using objective metrics like PSNR, SSIM, MS-SSIM, and VMAF. Recently, many
experts and researchers have provided quality performance analyses of well-known video
codecs such as H.264/AVC, H.265/HEVC, H.266/VVC, and AV1. The authors in [13]
performed an analysis between the HEVC and VVC codecs for test sequences with a
resolution ranging from 480p up to ultra HD (UHD) resolution using the Peak Signal-
Electronics 2024, 13, 904 3 of 19

to-Noise Ratio (PSNR) objective metric. In paper [14], the rate distortion analysis of
the same codecs using the PSNR, Structural Similarity Index (SSIM), and Video Multi-
Method Assessment Fusion (VMAF) quality metrics was provided. The authors in [15,16]
assessed the video quality of the HEVC, VVC, and AV1 compression standards for test
sequences with resolutions varying from 240p to UHD/4K, and in [17,18] at full HD
(FHD) and ultra HD (UHD) resolutions, respectively. The compression efficiency was
calculated using the PSNR objective metric. In [17,18], for quality evaluation, the Multi-
Scale Structural Similarity Index (MS-SSIM) method was used. Paper [19] presents a
comparative performance assessment of five video codecs—HEVC, VVC, AV1, EVC, and
VP9. The experimental evaluation was performed on three video datasets with three
different resolutions—768 × 432, 560 × 488, and 3840 × 2160 (UHD). Paper [20] deals with
an objective performance evaluation of the HEVC, JEM, AV1, and VP9 codecs using the
PSNR metric. A large test set of 28 video sequences with different resolutions varying from
240p to ultra HD (UHD) was generated. Paper [21] examines the compression performance
of three codecs, namely HEVC, VVC, and AV1, measured with the PSNR and SSIM objective
video quality metrics. In paper [22], the authors compared the coding performance of
HEVC, EVC, VVC, and AV1 in terms of computational complexity.
In [23], the authors proposed a new methodology for video quality assessment using
the just-noticeable difference (JND). The publication focuses on describing the process of
subjective tests. In [24], the authors presented an empirical study of the impact of packet-
loss-related errors on television viewing. They compared different delivery platforms and
technologies. Video sequences and delivery quality information obtained from the service
provider were used in the experiments. The sequence length, content, and connection
type were compared. In [25], 16 types of metrics were compared for quality assessment.
Packet loss was simulated in the video encoding and the losses were then hidden using
different techniques to conceal the errors. The purpose was to show that the subjective
quality of a video cannot be predicted from the visual quality of the frame alone when
some hidden error occurs. In [26], a new objective indicator, the pixel loss rate (XLR), was
proposed. It evaluates the packet loss rate during video streaming. This method achieved
comparable results with fully benchmarked metrics and a very high correlation with MOS.
In [27], the authors provided an overview of packet loss in Wi-Fi networks, mainly for
real-time multimedia.
In [28], an optimal packet classification method was proposed to classify packets
that were given a different priority when the transmission conditions deteriorated. The
network transmits the segments with the highest priority concerning perception quality
when constrained network conditions occur. The results showed that the proposed method
can achieve higher MOS scores compared to non-selective packet discarding. The authors
in [29] stated that highly textured video content is difficult to compress because a trade-off
between the bit rate and perceived video quality is necessary. Based on this, they introduced
a video texture dataset that was generated using a development environment. It was named
the BVI-SynTex video dataset and was created from 196 video sequences grouped into
three different texture types. It contains five-second full HD video scenes with a frame rate
of 60 fps and a depth of 8 bits.
Video analysis in smart cities is very useful when deploying various systems and
investigating their performance. A description of the technologies used for video analysis
in smart cities can be found in [30]. With the help of this analysis, various situations have
been studied, including traffic control and monitoring, security, and entertainment. In [31],
the authors evaluate the transmission of multimedia data at a certain throughput setting.
They then evaluate the performance and describe the benefits of real-time transfer. This
publication describes video surveillance in smart cities and multimedia data transmission
with cloud utilization. They discuss the impact of network connectivity on the transmission
of multimedia data over a network. An algorithm dealing with image and video processing
was presented by the authors in [32]. This solution suppresses noise to achieve accuracy in
Electronics 2024, 13, 904 4 of 19

the traffic scene. This knowledge was then used in a smart lighting experiment in a smart
city system.
The transmission of streaming video in automotive ad-hoc networks is addressed
by the authors in [33]. These scenarios investigate the conditions that affect the quality
of streaming, which is simulated in the NS3 environment. The publication [34] discusses
the influence of the resolution, number of subjects per frame, and frame rate on the
performance of metrics for object recognition in video sequences. The authors used videos
taken from cameras placed at intersections, which captured the scene from above. Changing
the cropping method or changing the frame width was described. The classification of
municipal waste using an automated system was proposed in paper [35]. The suggested
model classified the waste into multiple categories using convolutional neural networks.
The authors proposed a blind video quality assessment (BVQA) method based on
a DNN to compare scenarios in the wild in [36]. Transfer learning methods with spatial
and temporal information were used. They used the DNN to account for the motion
perception of video sequences for spatial features. They applied the results to six different
VQA databases. In this work, the authors used their knowledge from image quality
assessment (IQA). The authors of the research paper [4] have developed a database of
videos. This database is sampled and subjectively annotated and is intended to display
authentic distortions. To ensure that the dataset was diverse in terms of content and
multidimensional quality, six attributes were computed. These attributes included spatial
information, temporal information, blur, contrast, color, and VNIQE. The paper introduces
a new VQA database called KoNViD-lk.
In paper [37], the authors propose an Enhanced Quality Adaptation Scheme for DASH
(EQASH). The proposed scheme adjusts the quality of the segments not only based on
the network and playback buffer status but also based on the VBR characteristics of the
contents. The proposed scheme also reduces the latency by employing the new server
push feature in HTTP 2.0. According to a study [38], a video playback schedule that has
a minimum number of low-resolution video segments provides the best QoE. The paper
presents the M-Low linear time scheduling algorithm, which adjusts the video resolution
and optimizes the QoE indices in the DASH streaming service. The authors of the study
describe several QoE metrics, including the minimization of resolution switching events,
freeze-free playback, the maximization of the video playback bit rate, and the minimization
of low-resolution video segments.
The introduction of smart cities has revolutionized the way that we live, work, and
commute. Although the papers presented in the Introduction [1–8] have created various
databases, none of them have covered the content of smart cities directly. After conducting
extensive research, it has become evident that this topic is highly relevant and the creation
of a database that focuses on smart transportation and smart cities will be highly beneficial.
The primary objective of this work is to create a database that is unprecedented in its
type. The database will contain images that capture smart transportation and smart cities.
Additionally, the reference sequences will be transcoded into different quality settings.
The final transcoded images will be evaluated subjectively and objectively to ensure that
they meet the desired quality standards. This will enable researchers, developers, and
stakeholders to have access to high-quality images that can be used in various applications
related to smart cities.

3. Motivation
The concept of a smart city has emerged in the last decade, as a combination of ideas
on how information and communication technologies can enhance the functioning of cities.
The goal is to improve efficiency and competitiveness and provide new ways to address
poverty and social deprivation. The main idea is to coordinate and integrate technologies
that can bring new opportunities to improve quality of life. A smart city can take various
forms, such as a virtual, digital, or information city. These perspectives emphasize the role
of information and communication technologies in the future operation of cities.
Electronics 2024, 13, 904 5 of 19

The concept of Quality of Experience was introduced as an alternative to Quality


of Service (QoS) to design more satisfactory systems and services by considering human
perception and experience. Our research focuses on QoS as well as QoE and their in-
terconnection using a mapping function, followed by prediction. We test the impact of
various quality parameters, such as the resolution, bit rate, and compression standard, on
the resulting quality. In the case of QoE, we use subjective metrics for evaluation, while,
for QoS, objective metrics are used. We also simulate the impact of packet loss, delay, or
network congestion using a simulation tool to understand their effects on quality.
Based on the results and evaluations obtained, we recommend an appropriate choice
of parameters that will guarantee the maximum quality for the end user while ensuring
bandwidth efficiency for the provider. By combining these parameters, we can set the
variable bit rate (VBR) to stream the video as efficiently as possible. In a classical streaming
scenario, the video is viewed at one specific resolution, which is predefined before each
session is started using a connection-oriented transport layer protocol. Adaptive streaming,
on the other hand, involves encoding the video at multiple discrete bit rates. Each bitstream
or video with a specific resolution is then divided into sub-segments or chunks, each taking
a few seconds to complete (typically around 2–15 s). For optimal video quality during
playback, it is important to ensure that the end user’s connection conditions and download
speed are taken into consideration. VBR encoding can lead to inconsistencies in video block
size, which can cause frequent re-caching and reduce the user’s QoE, especially when the
network bandwidth is limited and fluctuating.
In this publication, we discuss the impact of various quality settings, such as the
codec used, resolution, and bit rate, on the overall quality. Both objective and subjective
metrics are used to determine the quality. Quality and appropriately set parameters are
also important in the field of smart cities and traffic. The rapid expansion of cities in recent
years has resulted in urban problems such as traffic congestion, public safety concerns, and
crime monitoring. Smart city technologies leverage data sensing and big data analytics
to gather information on human activities from entire cities. These data are analyzed to
provide intelligent services for public applications.

4. Methodology
Video quality analysis focuses on packet loss in the network depending on the codec
used, which causes artifacts in the video. We use QoE metrics to determine user satisfaction
boundaries and, most importantly, the application of such QoS tools in the network to guar-
antee the minimum QoE expected by the user. The use of the internet as an environment for
multimedia delivery is quite common today, but it is not entirely guaranteed that the user
will receive, in such an environment, a service with the desired quality. This makes QoE
monitoring and the search for links between QoS and QoE all the more important today.
It is essential to evaluate the performance of systems for the sending of information
from one source to another (data link) and ensure efficient information transfer. When
evaluating the transmission quality of IPTV services, we focus on user satisfaction with the
quality of media content. It is generally assumed that high performance and transmission
quality result in high user satisfaction with the service. From a human perceptual point
of view, quality is determined by the perceived composition, which involves a process
of perception and judgment. During this process, the perceiver compares the perceived
events with a previously unknown reference. The nature of the perceived composition may
not necessarily be a stable characteristic of the object, as the reference may influence what
is currently perceived. Quality is usually relative and occurs as an event in a particular
spatial, temporal, and functional context.
Objective quality assessment is an automated process, as opposed to subjective assess-
ment, which requires human involvement. There are three types of methods for objective
video quality models, which can be classified based on the availability of information about
the received signal, the original signal, or whether the signal is present at all (FF). In our
evaluation, we use FF objective methods (SSIM, MS-SSIM, PSNR, and VMAF). A more
Electronics 2024, 13, 904 6 of 19

detailed description can be found in our previous publications [39,40]. As a subjective


metric, we use the non-referential ACR method, because, in this case, the video is compared
only based on the seen video sequence and not by comparison with a reference. In a real
environment, when receiving a signal from a service provider, the end user also receives
only the received signal and does not compare it with the reference original. The quality
is defined by a 5-degree MOS scale. This standard [41] provides a methodology for the
subjective assessment of the quality of voice and video services from the end user’s per-
spective. This metric summarizes ratings that are averaged on a scale from 1, which is
the worst quality, to 5, which represents excellent quality. For more information, see our
publication [40].

5. Methods of Proposed Model


Our primary goal is to create video sequences in ultra HD 4K resolution, which will
contain various shots that map the traffic and the city. The created database of video
sequences will cover both static and dynamic sequences. The created video sequences
will then be transcoded to the necessary quality settings and objectively and subjectively
rated. Furthermore, they will be accessible for subjective evaluation by another group.
Each sequence will be identifiable by different parameters, e.g., spatial information (SI) and
temporal information (TI). Subsequently, using a neural network, an appropriate bit rate
can be allocated to each video sequence to achieve the desired quality.
To begin with, we had to take numerous shots, from which we selected reference
video sequences. These were chosen to cover as much space as possible in the SI and TI
quadrants. Their description can be found in Section 6.1. The next step was to encode these
reference sequences into a combination of full HD and ultra HD resolutions, using the
H.264 (AVC) and H.265 (HEVC) compression standards and bit rates of 5, 10, and 15 Mbps
using FFmpeg. FFmpeg is a collection of programs and libraries that enable the processing,
editing, or playing of multimedia files. It is operated via the command line. In our case, the
multimedia content had to be first encoded into a defined codec, which is a compression
algorithm. Then, it was decoded to enable its use. With transcoding, it is possible to convert
multimedia files to a different file container or codec, or to use different frame rates.
The selection and evaluation processes are illustrated in Figure 1. The encoding
process can be found in Section 6.2. After transcoding, the sequences are characterized
again using SI and TI information. The sequences are evaluated using objective metrics
such as SSIM, MS-SSIM, PSNR, and VMAF (see Section 6.3 for a description). The subjective
metric ACR evaluation is described in Section 6.4.

Figure 1. The selection and evaluation scheme of the video sequences.


Electronics 2024, 13, 904 7 of 19

6. Results
In this section, we describe how the database was created, the encoding of the resulting
video sequences, their characteristics, and then the objective and subjective evaluation of
each sequence.

6.1. Description of the Dataset


The video sequences were captured using a DJI Mavic Air 2 drone. Unfortunately,
this device does not allow the shooting of video sequences in uncompressed .yuv format.
However, UHD (3840 × 2160) format is available. The parameters that were chosen for
shooting can be found in Table 1. However, 4K with a resolution of 4096 × 2160 is not
yet used commercially and, therefore, UHD is preferred due to its 16:9 image ratio. This
is why a UHD resolution was used for our recording. The aim of this work is to create a
database of 4K video sequences that cover scenes from traffic monitoring and cityscapes.
These sequences will be encoded to the necessary quality parameters and rated either
objectively or subjectively. The video sequences that we have created offer a wide variety
of dynamicity, whether in the dynamics of the objects in the video or the dynamics of
the camera.
The following video sequences have been created, focusing on transport:
• Dynamic road traffic—dynamic camera motion—frequent traffic at higher vehicle
speeds (name in our database: Sc1);
• Dynamic road traffic—static camera motion (Sc2);
• Parking lot—dynamic camera motion—less dynamic movement of cars in a parking
lot (Sc8);
• Parking lot—static camera motion (Sc3);
• Road traffic—busy traffic at lower vehicle speeds (Sc4);
• Traffic roundabout—dynamic camera motion—traffic on a roundabout (Sc5);
• Traffic roundabout with a parking lot—a dynamic part of the scene with slow move-
ment in the parking lot (Sc6);
• Traffic roundabout—static camera motion—traffic on a roundabout (Sc10);
• Train station—train leaving the station (Sc7);
• Dynamic train—train in dynamic motion (Sc9);
• Trolleybus—trolleybus arriving at a public transport stop;
• Dynamic trolleybus—trolleybus in dynamic driving;
• The university town—university town (movement of people);
• Waving flags—flags flying in the university town.
A preview of the reference sequences can be found in Figure 2.

Table 1. Recording parameters.

Name of the Parameter Value of the Video Sequence Parameter


Resolution Ultra HD (3840 × 2160)
Compression standard H.265/HEVC
Bit rate 120 Mbps
Video frame rate 50 fps (frames per second)
Subsampling 4:2:0
Bit depth 8b

Each video sequence was evaluated based on its SI and TI values. The resulting
parameter value is the maximum value across all frames. The temporal information is
derived from changes in the brightness of the same pixels in successive frames, while the
spatial information is obtained from the Sobel filter for the luminance component and the
subsequent calculation of the standard deviation of the pixels. More details can be found in
our publication [40]. Table 2 shows the characterization of the reference sequences based
Electronics 2024, 13, 904 8 of 19

on the SI and TI values, which highlights the diversity of the individual sequences in terms
of spatial and temporal information.

Figure 2. Previews of created sequences.

Table 2. SI and TI information of the reference video sequences.

Max SI Max TI
Sc1 90.8834 35.9334
Sc2 83.8965 6.29607
Sc3 85.1865 5.19369
Sc4 88.624 19.3205
Sc5 72.303 22.8181
Sc6 74.9079 11.1935
Sc7 96.9839 16.6872
Sc8 78.7106 5.47847
Sc9 87.0767 24.1257
Sc10 76.3554 5.06084

6.2. Encoding of the Reference Video Sequences


The first ten reference sequences from the list above were further encoded to the full
HD (1920 × 1080) resolution and H.264/AVC compression standard. These are labeled
“Sc_x” so that we can name them for each variation. These reference sequences were selected
precisely based on the characterization of temporal and spatial information. The quality of
the encoded content is determined by the amount of data lost during the compression and
decompression of the content. In real broadcasting, a bit rate of 10 Mbps is often used for
HD resolution, and some stations use bit rates up to around 15 Mbps. This bit rate is also
taken into account for UHD deployments. Therefore, each sequence in both resolutions
and compression standards has been encoded with bit rate values of 5, 10, and 15 Mbps.
Table 3 shows the parameters that we used in encoding the video sequences. This
combination produced twelve variations for each one, which means 120 sequences. We
evaluated each with objective metrics. Seven of them were also evaluated subjectively
(Sc1–5, Sc9–10). The seven sequences for subjective evaluation were selected to comply
with the recommendations [42]. With the combination of seven sequence types with twelve
Electronics 2024, 13, 904 9 of 19

coding variations, we could evaluate one group continuously without a long pause. If we
selected more sequence types, we would have needed to split the subjective evaluation of
one group of evaluators, which would imply a larger time requirement. When selecting
these seven video sequences, we also considered the calculation of the spatial and temporal
information of the sequences.
The created database will be available to the general scientific public. The created video
sequences can be further used for the needs of the analysis of the appropriate qualitative
setting in order to provide the highest possible quality while saving as many network
resources as possible. Thus, it will be possible to further work with the database, to shoot
new sequences, which will then be evaluated, either by objective or subjective tests. This
will give a detailed view of the performance of streaming video over IP-based networks.
Video sequences offer the possibility to test which other parameters can characterize a given
sequence, or how individual video parameters affect the quality. It will also be possible
to see at which bit rate each scene achieves the highest end user satisfaction in terms of
quality and thus define boundaries for each scene based on selected content information. A
suitable bit rate would be assigned for each boundary so that it satisfies the highest quality
at the individual resolution. This will allow technologies and applications in the smart
cities and smart traffic sector to use the available resources efficiently.
We encoded the created reference sequences with changing quality parameters using
FFmpeg. A coding example for 15 Mbps is as follows:
ffmpeg -i input_sequence -vf scale = resolution -c:v codec -b:v 15000k -maxrate 15000k -bufsize
15000k -an -pix_fmt yuv420p -framerate 50 SeqName.ts.
A description of the individual parameters used in the command is as follows:
- i is used to import video from the selected file;
- vf scale is used to specify the resolution of the video; in our case, this parameter was
changed for full HD resolution (1920 × 1080) and uHD resolution (3840 × 2160);
- c:v is used to change the video codec; we used two codecs—H.264/AVC, which is
written libx264, and the H.265/HEVC codec, which is written in libx265;
- b:v is used to select the bit rate; we varied this parameter at 5, 10, and 15 Mbps;
- maxrate is used to set the maximum bit rate tolerance; it requires buffsize in the settings;
- buffsize is used to choose the buffer;
- an is a parameter that removes the audio track from the video;
- pix_fmt is the parameter used to select the subsampling;
- framerate is used to set the number of frames per second.
The last parameter is the video output, where we set the video name and its format.

Table 3. Parameters of encoded sequences.

Resolution Full HD, Ultra HD


Compression standard H.264/AVC, H.265/HEVC
Bit rate [Mbps] 5, 10, 15
Frames per second 50 fps
Subsampling 4:2:0
Bit depth 8b

The output sequence has been encoded into a .ts container so that we can test the
impact of packet loss in the future. One can use programs like Media Info and Bitrate
Viewer to check the individual transcoded parameters. Media Info will display all the
parameters and settings of the video, while Bitrate Viewer is used to display the bit rate in
exact time.
We have included a five-second pause between each sequence to ensure that the
evaluators do not overthink the evaluation and it remains spontaneous. The video contains
a grey background, so that the image is not distorted and does not draw the eyes of the
evaluators. We have inserted text into the grey background that describes the rating so that
the raters know in which part of the evaluation process to conduct it.
Electronics 2024, 13, 904 10 of 19

To ensure an accurate evaluation, a maximum of three people participated simulta-


neously, and they had a direct and undistorted view of the TV set. The video sequences
were evaluated on a Toshiba 55QA4163DG TV set placed 1.1 m from the raters, in compli-
ance with the standard [42]. The distance between the viewer and the monitor should be
1.5 times the height of the monitor. Each evaluator had access to a questionnaire, where
they recorded early evaluations of a given sequence. A total of 30 human raters participated
in the evaluation, rating 84 video sequences. The MOS rating scale of 1 to 5 was used for
the evaluation, where 1 represents the worst quality, while 5 is the best.
In the following sections, we will analyze the outcomes of both the objective (see Section 6.3)
and subjective (see Section 6.4) evaluations. Please note that the results presented here are
based on selected samples only, while all other numerical or graphical data can be obtained
upon request. Moreover, we are currently working on creating a website where the entire
database will be published and available for free.

6.3. Objective Quality Evaluation


In the case of objective evaluation, we selected one video sequence to present the
results, namely the traffic roundabout with a parking lot (Sc6). For this sequence, we
present the evaluation progress frame by frame for the individual objective metrics (SSIM,
MS-SSIM, PSNR, and VMAF) for a 15 Mbps bit rate in both resolutions and codecs. Results
are presented by normalized value range <0, 1>. Here, it is possible to compare the overall
correlation of the evaluated metrics.
The results for the ultra HD resolution for the H.265 (HEVC) codec can be seen in
Figure 3 and for the H.264 (AVC) codec in Figure 4. The full HD resolution can be viewed
in Figure 5 for the H.265 (HEVC) compression standard and in Figure 6 for the H.264 (AVC)
compression standard. With such a high bit rate, the H.265 compression standard achieves
better results compared to H.264 for both resolutions.
The full HD resolution achieves a better rating. With an increasing bit rate, the differ-
ence is smaller. When comparing Figures 3 and 4, we can conclude that the ratings correlate
with each other and there are noticeable equal rating shifts in both compression standards.
At full HD resolution, we can observe a larger variation between the compression standard
H.265 in Figure 5 and for H.264 in Figure 6. Mapping the results of the different objective
metrics confirms the high correlation between the methods used and brings the comparison
of these metrics closer to the citers. We can also see that the VMAF scores oscillate more
than the results of other metrics.

Figure 3. Traffic roundabout with a parking lot (Sc6): Ultra HD, H.265, 15 Mbps norm.
Electronics 2024, 13, 904 11 of 19

Figure 4. Traffic roundabout with a parking lot (Sc6): UHD, H.264, 15 Mbps norm.

Figure 5. Traffic roundabout with a parking lot (Sc6): Full HD, H.265, 15 Mbps norm.

We present the final results for each sequence (Sc1–Sc10) in the form of mean values
of the VMAF and PSNR metrics for the 15 Mbps bit rates. As expected, the H.265 codec
achieves better results, and we can also see an improvement in the results with an increasing
bit rate value. The results of the objective evaluation of all sequences for the ultra HD
resolution in combination with the H.265 (HEVC) codec are shown in Figure 7 and for the
H.264 (AVC) codec in Figure 8. The results for the other sequences also confirm that the
H.265 (HEVC) compression standard has a better rating. For some sequences, the difference
is more pronounced, which is due to the dynamics of the scene.
Electronics 2024, 13, 904 12 of 19

Figure 6. Traffic roundabout with a parking lot (Sc6): Full HD, H.264, 15 Mbps norm.

Figure 7. Mean values of VMAF and PSNR for UHD, H.265, 15 Mbps.

At full HD resolution, the differences between H.265 (HEVC), which can be seen in
Figure 9, and H.264 (AVC), shown in Figure 10, are smaller. In both cases, the full HD
resolution achieves higher values than the ultra HD resolution.

6.4. Subjective Quality Evaluation


In this section, we present the results of the subjective evaluation of seven reference
sequences. These sequences were recoded into various qualitative parameters. We calcu-
lated the average ratings from 30 users for each type of coded sequence from the references
Sc1 (dynamic road traffic—dynamic camera motion) and Sc2 (dynamic road traffic—static
camera motion), as shown in Figure 11a. The results for Sc3 (parking lot—static camera
motion) and Sc4 (road traffic) can be seen in Figure 11b, while the results for Sc5 (traf-
fic roundabout—dynamic camera motion) and SC10 (traffic roundabout—static camera
motion) are presented in Figure 11c.
Electronics 2024, 13, 904 13 of 19

VMAF mean - UHD H.264 15M

90

80

70

(1) 60
::J
Cl3
� 50
Cl3
(1)

E 40

30

20

10

1 2 3 4 5 6 7 8 9 10
scene number

PSNR mean - UHD H.264 1 5M

40

35

30
(1)
.2
Cl3 25

Cl3
� 20

15

10

1 2 3 4 5 6 7 8 9 10
scene number

Figure 8. Mean values of VMAF and PSNR for UHD, H.264, 15 Mbps.

Figure 9. Mean values of VMAF and PSNR for full HD, H.265, 15 Mbps.

In Table 4, one can find the complete results of the transcoded sequences from the
dynamic train (Sc9) reference sequence. Table 4 includes the average result as well as the
exact number of occurrences for each MOS scale value.
Electronics 2024, 13, 904 14 of 19

Figure 10. Mean values of VMAF and PSNR for full HD, H.264, 15 Mbps.

(a)

(b) (c)

Figure 11. Average values of subjective evaluation. (a) Subjective evaluation of Sc1 and Sc2.
(b) Subjective evaluation of Sc3 and Sc4. (c) Subjective evaluation of Sc5 and Sc10.
Electronics 2024, 13, 904 15 of 19

Table 4. Dynamic train—train in dynamic motion (Sc9).

MOS Score
1 2 3 4 5 Average Value
Sequence 1: (15 Mbps, H.264, UHD) 0 times 6 times 11 times 9 times 4 times 3.37
Sequence 2: (10 Mbps, H.264, UHD) 2 7 14 7 0 2.87
Sequence 3: (15 Mbps, H.264, Full HD) 10 8 7 4 1 2.27
Sequence 4: (10 Mbps, H.264, Full HD) 1 5 15 8 1 3.1
Sequence 5: (15 Mbps, H.265, UHD) 0 3 7 11 9 3.87
Sequence 6: (10 Mbps, H.265, UHD) 2 6 10 9 3 3.17
Sequence 7: (15 Mbps, H.265, Full HD) 4 6 11 8 1 2.87
Sequence 8: (10 Mbps, H.265, Full HD) 2 6 10 9 3 3.17
Sequence 9: (5 Mbps, H.264, UHD) 20 8 1 1 0 1.43
Sequence 10: (5 Mbps, H.265, UHD) 3 7 12 6 2 2.9
Sequence 11: (5 Mbps, H.264, Full HD) 7 7 9 7 0 2.53
Sequence 12: (5 Mbps, H.265, Full HD) 1 11 8 8 2 2.97

6.5. Correlation between Objective and Subjective Assessments


There are various metrics to express the correlation between subjective and objective
assessments. The two most commonly used statistical metrics to measure the performance
are the Root Mean Square Error (RMSE) and Pearson’s correlation coefficient. A high
correlation value (usually greater than 0.8) is considered to be effective. To measure the
correlation, we used three sequences (Sc1—dynamic road traffic, Sc9—dynamic train, and
Sc10—traffic roundabout) in UHD resolution for comparison. The results show that there is
a strong correlation between the subjective evaluation by the respondents and the objective
evaluation. One can see the correlation between these evaluations in Table 5.

Table 5. Correlation between subjective and objective evaluations.

Sc1 Sc9 Sc10


MOS SSIM MOS SSIM MOS SSIM
Sequence 1: (15 Mbps, H.264, UHD) 3.8 0.929 2.27 0.968 3.6 0.962
Sequence 2: (10 Mbps, H.264, UHD) 3.43 0.903 3.1 0.961 3.57 0.955
Sequence 5: (15 Mbps, H.265, UHD) 3.6 0.942 2.87 0.974 3.73 0.971
Sequence 6: (10 Mbps, H.265, UHD) 3.47 0.929 3.17 0.969 3.67 0.967
Sequence 9: (5 Mbps, H.264, UHD) 1.67 0.82 2.53 0.938 2.73 0.928
Sequence 10: (5 Mbps, H.265, UHD) 3.63 0.894 2.97 0.953 3.57 0.956
Pearson correlation coefficient 0.917 0.981 0.968

7. Discussion
We need to consider the purpose and space of capturing individual moments when
monitoring smart city footage. Depending on the importance of the captured part of the
city, we can define the necessary quality of the recording. If we need to address security, we
can use high-resolution security cameras such as Internet Protocol cameras (IP cameras),
which can produce a 4K resolution or better. However, when monitoring a certain event,
checking the traffic, or monitoring a location with a static background, we do not need the
best-resolution video. In this case, wireless cameras can be used, but their quality may not
match the reality of the viewed footage. The quality of the footage may be limited by an
insufficient Wi-Fi signal or a monitor/display with a lower resolution on which the video
footage is viewed. The selection of an individual system for deployment involves several
important aspects. Our recommendations for the setting of the quality parameters can help
to determine appropriate parameters. We can define sufficient quality for different types
of video sequences based on the deployment requirements. To achieve this, we created
a large set of video sequences, some of which had to be recorded multiple times due to
poor weather conditions or image interference. The final shots were of high quality, with
different object dynamics in the scenes and dynamic camera movement.
Electronics 2024, 13, 904 16 of 19

We have created a database of 4K video sequences that cover scenes from traffic or city
monitoring. Our goal is to expand this database with video sequences shot with different
devices, such as classic cameras, drones, mobile phones, and GoPro cameras. This will
help us to determine whether the quality is also affected by the camera on which the
video sequences are shot. In the future, we plan to extend the encoded sequences with
the H.266 and AV1 codecs and bit rates of 1, 3, 7, and 20 Mbps, to compare the ratings of
other combinations of quality parameters. We are also considering using other metrics for
objective evaluation and a larger sample for subjective evaluation.
We are looking for partners who can provide us with video sequences to improve
our monitoring system. Our team is interested in collaborating with the city of Zilina to
identify video sequences that could be used to enhance the system. We are also interested
in using some of their own recordings. Furthermore, we are looking for a reliable security
systems company to partner with and expand our database in the future. In addition,
we are interested in working with partners who can help us to film 8K sequences and
expand our laboratory with 8K imaging units to perform subjective tests. Although we
have reached out to other universities in Slovakia and the Czech Republic, the possibilities
are currently limited.
The reference sequences of our database are available at [43]. All of them, as well as
the encoded sequences, can be downloaded by researchers from the server using the File
Transfer Protocol (FTP). The FTP server is configured to allow passwordless access to users
at IP address 158.193.214.161 via any FTP client. Once connected, the user has access to
both reference and transcoded sequences. The “reference sequences” section contains the
names defined in the description of the dataset, while the “encoded sequences” section
contains sub-sequences for each resolution (full HD, ultra HD). The transcoded sequences’
names are defined by the key original sequence name_compression standard_bitrate. We
have a test web page that is currently being finalized, which will contain these sequences,
their descriptions, and a contact form where users can leave comments or advice. Until
the website is launched, interested parties can contact the researchers by email for more
information or to provide feedback.
Modern technology is rapidly developing in all areas of society. However, the potential
advantages and disadvantages of these technologies are often not sufficiently discussed.
Although they can make our lives easier and more efficient, they can also have a negative
impact on social relationships. An example is the use of industrial cameras in public spaces.
CCTV cameras are used in public spaces primarily for monitoring and crime prevention.
However, this type of surveillance raises human rights concerns that are often overlooked
in discussions about the use of modern technology. CCTV is intended for places where
increased security and public surveillance are needed, and smart technologies are used to
create a safer environment. Video recordings do not target individuals or their personal
belongings, but rather are used for research purposes. Anyone who downloads sequences
from our store agrees to this statement.

8. Conclusions
The purpose of this research was to create 4K UHD video sequences to capture traffic
conditions in the city and monitor specific areas. The footage was intended to be used to
analyze quality requirements and provide recommendations for the implementation of
technologies such as smart cities or smart traffic. To begin, we determined the types of video
sequences that could be applicable in the smart cities or traffic sector. We selected video
sequences that provided slower but also more dynamic shots, as well as video sequences
where the camera movement was both static and dynamic, changing the characteristics
of the footage. We identified individual video scenes through spatial and temporal infor-
mation, knowing that camera movement also affects these values, producing a different
type of video sequence. For transportation, we chose Zilina’s available means of public
transportation, specifically the trolleybus coming and going from the public transport
stop, as well as its dynamic driving. We also recorded the traffic situation at lower and
Electronics 2024, 13, 904 17 of 19

higher speeds, including busy roads, roundabouts, and parked vehicles. We focused on rail
transport as well, recording slower trains arriving or leaving the station and faster-moving
trains. Selecting video sequences for smart cities was more difficult, as we needed to cover
different dynamics. We chose a sequence that monitored the movement of people in a
university town and flags flying as a demonstration of an object that could be recorded.
Monitoring systems record various situations, whether in the context of security or sensing
different situations, where the system helps to evaluate the appropriate response.
We used both objective and subjective methods to evaluate the tests conducted and,
based on the measurements obtained, we plan to propose a QoS model for the estimation
of triple-play services in our future work. Our next focus is to assess the quality of video
data delivery in various scenarios by simulating different values of packet loss and delay
in the network. The results of this study will help us to determine whether it is better for
video quality to receive packets in the incorrect order or to lose them entirely.
We plan to expand our database to include video sequences recorded with various
devices, including mobile phones, GoPro cameras, and conventional 4K cameras. This
comprehensive database will allow us to compare the resulting quality of the videos
captured by different devices. These comparisons will help to improve stream services. We
will also develop a prediction model that can calculate the resulting video quality based
on the network’s state and behavior. This model can be used by ISPs during the network
architecture design process.

Author Contributions: Conceptualization, L.S. and M.U.; methodology, L.S.; software, L.S.; valida-
tion, L.S., M.U. and J.F.; formal analysis, L.S., M.U. and J.F.; investigation, L.S.; resources, L.S.; data
curation, L.S., M.U. and J.F.; writing—original draft preparation, L.S.; writing—review and editing,
L.S., M.U. and J.F.; visualization, L.S. and M.U.; supervision, L.S.; project administration, L.S. and
M.U.; funding acquisition, L.S. and M.U. All authors have read and agreed to the published version
of the manuscript.
Funding: This work has been supported by the Slovak VEGA grant agency, Project No. 1/0588/22,
“Research of a location-aware system for achievement of QoE in 5G and B5G networks”.
Data Availability Statement: Our database’s reference sequences can be found at https://doi.org/10
.5281/zenodo.10663664 [43], while the FTP server with IP address 158.193.214.161 hosts also encoded
sequences and evaluations. Detailed information is described in Section 7.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Mercat, A.; Viitanen, M.; Vanne, J. UVG dataset: 50/120fps 4K sequences for video codec analysis and development. In
Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, Istanbul, Turkey, 8–11 June 2020. [CrossRef]
2. Song, L.; Tang, X.; Zhang, W.; Yang, X.; Xia, P. The SJTU 4K video sequence dataset. In Proceedings of the 2013 Fifth International
Workshop on Quality of Multimedia Experience (QoMEX), Klagenfurt am Wörthersee, Austria, 3–5 July 2013. [CrossRef]
3. Ghadiyaram, D.; Pan, J.; Bovik, A.C. A Subjective and Objective Study of Stalling Events in Mobile Streaming Videos. IEEE Trans.
Circuits Syst. Video Technol. 2019, 29, 183–197. [CrossRef]
4. Hosu, V.; Hahn, F.; Jenadeleh, M.; Lin, H.; Men, H.; Sziranyi, T.; Li, S.; Saupe, D. The Konstanz natural video database (KoNViD-
1k). In Proceedings of the 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), Erfurt, Germany,
31 May–2 June 2017. [CrossRef]
5. Bampis, C.G.; Li, Z.; Katsavounidis, I.; Huang, T.Y.; Ekanadham, C.; Bovik, A. Towards Perceptually Optimized End-to-end
Adaptive Video Streaming. arXiv 2018, arXiv:1808.03898.
6. Ghadiyaram, D.; Pan, J.; Bovik, A.C.; Moorthy, A.K.; Panda, P.; Yang, K.C. In-Capture Mobile Video Distortions: A Study of
Subjective Behavior and Objective Algorithms. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 2061–2077. [CrossRef]
7. Duanmu, Z.; Ma, K.; Wang, Z. Quality-of-Experience for Adaptive Streaming Videos: An Expectation Confirmation Theory
Motivated Approach. IEEE Trans. Image Process. 2018, 27, 6135–6146. [CrossRef] [PubMed]
8. Sinno, Z.; Bovik, A.C. Large-Scale Study of Perceptual Video Quality. IEEE Trans. Image Process. 2019, 28, 612–627. [CrossRef]
[PubMed]
9. Long, C.; Cao, Y.; Jiang, T.; Zhang, Q. Edge Computing Framework for Cooperative Video Processing in Multimedia IoT Systems.
IEEE Trans. Multimed. 2018, 20, 1126–1139. [CrossRef]
Electronics 2024, 13, 904 18 of 19

10. Li, M.; Chen, H.L. Energy-Efficient Traffic Regulation and Scheduling for Video Streaming Services Over LTE-A Networks. IEEE
Trans. Mob. Comput. 2019, 18, 334–347. [CrossRef]
11. Grajek, T.; Stankowski, J.; Karwowski, D.; Klimaszewski, K.; Stankiewicz, O.; Wegner, K. Analysis of Video Quality Losses in
Homogeneous HEVC Video Transcoding. IEEE Access 2019, 7, 96764–96774. [CrossRef]
12. Ramachandra Rao, R.R.; Goring, S.; Robitza, W.; Feiten, B.; Raake, A. AVT-VQDB-UHD-1: A Large Scale Video Quality Database
for UHD-1. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December
2019. [CrossRef]
13. Bouaafia, S.; Khemiri, R.; Sayadi, F.E. Rate-Distortion Performance Comparison: VVC vs. HEVC. In Proceedings of the 2021 18th
International Multi-Conference on Systems, Signals &amp; Devices (SSD), Monastir, Tunisia, 22–25 March 2021. [CrossRef]
14. Mercat, A.; Makinen, A.; Sainio, J.; Lemmetti, A.; Viitanen, M.; Vanne, J. Comparative Rate-Distortion-Complexity Analysis of
VVC and HEVC Video Codecs. IEEE Access 2021, 9, 67813–67828. [CrossRef]
15. García-Lucas, D.; Cebrián-Márquez, G.; Cuenca, P. Rate-distortion/complexity analysis of HEVC, VVC and AV1 video codecs.
Multimed. Tools Appl. 2020, 79, 29621–29638. [CrossRef]
16. Topiwala, P.; Krishnan, M.; Dai, W. Performance comparison of VVC, AV1 and EVC. In Applications of Digital Image Processing
XLII; Tescher, A.G., Ebrahimi, T., Eds.; SPIE: Bellingham, WA, USA, 2019. [CrossRef]
17. Nguyen, T.; Wieckowski, A.; Bross, B.; Marpe, D. Objective Evaluation of the Practical Video Encoders VVenC, x265, and aomenc
AV1. In Proceedings of the 2021 Picture Coding Symposium (PCS), Bristol, UK, 29 June–2 July 2021. [CrossRef]
18. Nguyen, T.; Marpe, D. Compression efficiency analysis of AV1, VVC, and HEVC for random access applications. Apsipa Trans.
Signal Inf. Process. 2021, 10, e11. [CrossRef]
19. Valiandi, I.; Panayides, A.S.; Kyriacou, E.; Pattichis, C.S.; Pattichis, M.S. A Comparative Performance Assessment of Different
Video Codecs. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2023; pp. 265–275. [CrossRef]
20. Nguyen, T.; Marpe, D. Future Video Coding Technologies: A Performance Evaluation of AV1, JEM, VP9, and HM. In Proceedings
of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018. [CrossRef]
21. Pourazad, M.T.; Sung, T.; Hu, H.; Wang, S.; Tohidypour, H.R.; Wang, Y.; Nasiopoulos, P.; Leung, V.C. Comparison of Emerging
Video Compression Schemes for Efficient Transmission of 4K and 8K HDR Video. In Proceedings of the 2021 IEEE International
Mediterranean Conference on Communications and Networking (MeditCom), Athens, Greece, 7–10 September 2021. [CrossRef]
22. Grois, D.; Giladi, A.; Choi, K.; Park, M.W.; Piao, Y.; Park, M.; Choi, K.P. Performance Comparison of Emerging EVC and VVC
Video Coding Standards with HEVC and AV1. In Proceedings of the SMPTE 2020 Annual Technical Conference and Exhibition,
Virtual, 10–12 November 2020. [CrossRef]
23. Haiqiang Wang, I.K. VideoSet: A Large-Scale Compressed Video Quality Dataset Based on JND Measurement. J. Vis. Commun.
Image Represent. 2016, 46, 292–302. [CrossRef]
24. Karthikeyan, V.; Allan, B.; Nauck, D.D.; Rio, M. Benchmarking Video Service Quality: Quantifying the Viewer Impact of
Loss-Related Impairments. IEEE Trans. Netw. Serv. Manag. 2020, 17, 1640–1652. [CrossRef]
25. Kazemi, M.; Ghanbari, M.; Shirmohammadi, S. The Performance of Quality Metrics in Assessing Error-Concealed Video Quality.
IEEE Trans. Image Process. 2020, 29, 5937–5952. [CrossRef] [PubMed]
26. Diaz, C.; Perez, P.; Cabrera, J.; Ruiz, J.J.; Garcia, N. XLR (piXel Loss Rate): A Lightweight Indicator to Measure Video QoE in IP
Networks. IEEE Trans. Netw. Serv. Manag. 2020, 17, 1096–1109. [CrossRef]
27. Silva, C.A.G.D.; Pedroso, C.M. MAC-Layer Packet Loss Models for Wi-Fi Networks: A Survey. IEEE Access 2019, 7, 180512–180531.
[CrossRef]
28. Neves, F.; Soares, S.; Assuncao, P.A.A. Optimal voice packet classification for enhanced VoIP over priority-enabled networks.
J. Commun. Netw. 2018, 20, 554–564. [CrossRef]
29. Katsenou, A.V.; Dimitrov, G.; Ma, D.; Bull, D.R. BVI-SynTex: A Synthetic Video Texture Dataset for Video Compression and
Quality Assessment. IEEE Trans. Multimed. 2021, 23, 26–38. [CrossRef]
30. Badidi, E.; Moumane, K.; Ghazi, F.E. Opportunities, Applications, and Challenges of Edge-AI Enabled Video Analytics in Smart
Cities: A Systematic Review. IEEE Access 2023, 11, 80543–80572. [CrossRef]
31. Chen, Y.Y.; Lin, Y.H.; Hu, Y.C.; Hsia, C.H.; Lian, Y.A.; Jhong, S.Y. Distributed Real-Time Object Detection Based on Edge-Cloud
Collaboration for Smart Video Surveillance Applications. IEEE Access 2022, 10, 93745–93759. [CrossRef]
32. Yun, Q.; Leng, C. Intelligent Control of Urban Lighting System Based on Video Image Processing Technology. IEEE Access 2020,
8, 155506–155518. [CrossRef]
33. Smida, E.B.; Fantar, S.G.; Youssef, H. Video streaming challenges over vehicular ad-hoc networks in smart cities. In Proceedings
of the 2017 International Conference on Smart, Monitored and Controlled Cities (SM2C), Sfax, Tunisia, 17–19 February 2017.
[CrossRef]
34. Duan, Z.; Yang, Z.; Samoilenko, R.; Oza, D.S.; Jagadeesan, A.; Sun, M.; Ye, H.; Xiong, Z.; Zussman, G.; Kostic, Z. Smart City Traffic
Intersection: Impact of Video Quality and Scene Complexity on Precision and Inference. In Proceedings of the 2021 IEEE 23rd Int
Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City;
7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Haikou,
China, 20–22 December 2021. [CrossRef]
Electronics 2024, 13, 904 19 of 19

35. Malik, M.; Prabha, C.; Soni, P.; Arya, V.; Alhalabi, W.A.; Gupta, B.B.; Albeshri, A.A.; Almomani, A. Machine Learning-Based
Automatic Litter Detection and Classification Using Neural Networks in Smart Cities. Int. J. Semant. Web Inf. Syst. 2023, 19, 1–20.
[CrossRef]
36. Li, B.; Zhang, W.; Tian, M.; Zhai, G.; Wang, X. Blindly Assess Quality of In-the-Wild Videos via Quality-Aware Pre-Training and
Motion Perception. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5944–5958. [CrossRef]
37. Lee, S.; Roh, H.; Lee, N. Enhanced quality adaptation scheme for improving QoE of MPEG DASH. In Proceedings of the
2017 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea
, 18–20 October 2017. [CrossRef]
38. Chang, S.H.; Wang, K.J.; Ho, J.M. Optimal DASH Video Scheduling over Variable-Bit-Rate Networks. In Proceedings of the 2018
9th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Taipei, Taiwan, 26–28 December
2018. [CrossRef]
39. Mizdos, T.; Barkowsky, M.; Uhrina, M.; Pocta, P. How to reuse existing annotated image quality datasets to enlarge available training
data with new distortion types. Multimed. Tools Appl. 2021, 80, 28137–28159. [CrossRef]
40. Sevcik, L.; Voznak, M. Adaptive Reservation of Network Resources According to Video Classification Scenes. Sensors 2021,
21, 1949. [CrossRef]
41. ITU-T. Recommendation ITU-T P.800.1—Mean Opinion Score (MOS) Terminology. 2016. Available online: https://www.itu.int/
rec/T-REC-P.800.1 (accessed on 23 February 2024).
42. ITU-T. Recommendation ITU-T P.1204.5—Video Quality Assessment of Streaming Services over Reliable Transport for Resolutions
Up to 4K with Access to Transport and Received Pixel Information. 2023. Available online: https://www.itu.int/rec/T-REC-P.
1204.5 (accessed on 23 February 2024).
43. Sevcik, L. UHD Database Focus on Smart Cities and Smart Transport. Zenodo. 2024. Available online: https://doi.org/10.5281/
ZENODO.10663664 (accessed on 23 February 2024).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like