Visvesvaraya Technological University: A Technical Seminar Report On
Visvesvaraya Technological University: A Technical Seminar Report On
18CSS84
A Technical Seminar Report on
CERTIFICATE
This is to certify that the technical seminar entitled “ON THE CLASSIFICATION OF FOG
External Viva:
1.…………………………………………………. …………………………………………………
2.…………………………………………………. ………………………………………………….
ACKNOWLEDGEMENT
Any achievement, be it scholastic or otherwise does not depend solely on the individual efforts
but on the guidance, encouragement and cooperation of intellectuals , elders and friends . I would
like to take this opportunity to thank them all.
First and foremost I would like to express my sincere regards and thanks to Mr. Pramod
Gowda and Mr. Rajiv Gowda , CEO’s , East Point Group of Institutions , Bangalore, for
providing necessary infrastructure and creating good environment.
I express my gratitude to Dr. T.K Sateesh , Principal , EPCET who has always been a great
source of inspiration.
I express my sincere regards and thanks to Dr. C Emilin Shyni , Professor and Head of
Department of Computer Science and Engineering , EPCET , Bangalore , for her encouragement
and support.
I am obliged to Dr. Heena Kousar , Associate Professor , who rendered valuable assistance as
the seminar coordinator.
I am grateful to our reviewers Mr. Nithyananda , Associate Professor and Mrs. Shravani
Assistant Professor , Department of Computer Science and Engineering , EPCET, Bangalore, for
their valuable inputs and constant supervision in completing the technical seminar successfully.
I also extend my thanks to the entire faculty of the Department of Computer Science and
Engineering , EPCET , Bangalore , who have encouraged me throughout the course of the
seminar.
Lastly, I would like to thank my family and friends for their support and encouragement during
the course of my seminar.
Currently, Internet applications running on mobile devices generate a massive amount of data
that can be transmitted to a Cloud for processing. However, one fundamental limitation of a
Cloud is the connectivity with end devices. Fog Computing overcomes this limitation and
supports the requirements of time-sensitive applications by distributing computation,
communication, and storage services along the Cloud to Things (C2T) continuum, empowering
potential new applications, such as smart cities, augmented reality (AR), and virtual reality (VR).
However, the adoption of Fog-based computational resources and their integration with the
Cloud introduces new challenges in resource management, which requires the implementation of
new strategies to guarantee compliance with the quality of service (QoS) requirements of
applications. In this context, one major question is how to map the QoS requirements of
applications on Fog and Cloud resources. One possible approach is to discriminate the
applications arriving at the Fog into Classes of Service (CoS). This paper thus introduces a set of
CoS for Fog applications which includes, the QoS requirements that best characterize these Fog
applications. Moreover, this paper proposes the implementation of a typical machine learning
classification methodology to discriminate Fog Computing applications as a function of their
QoS requirements.
i
CONTENTS
1 Introduction 1
2 Literature Survey 3
3.4.2 Pre-Processing 11
3.4.3 Classification 12
4 Result Analysis 18
5 Conclusion 19
References
ii
LIST OF FIGURES
iii
LIST OF TABLES
iv
On the Classification of Fog computing applications: A Machine Learning Perspective
CHAPTER 1
INTRODUCTION
Cloud Computing enables ubiquitous access to shared pools of configurable resources and
services over the Internet that can be rapidly provisioned with minimal management effort.
However, with the increasing relevance of the Internet of Things (IoT), mobile and multimedia
applications, the transfer delays between the Cloud and an end device have been deemed too long
and not suitable for latency-sensitive applications, making the main limitation in the use of the
Cloud for latency-sensitive and mobile applications.
Fog computing aims at coping with these demands by hosting Cloud services on connected
heterogeneous devices, typically, but not exclusively located at the edge of the network . The Fog
provides a geographically distributed architecture for computation, communication, and storage,
which targets real-time applications and mobile services. End-users benefit from pre-processing
of workloads, geo-distribution of resources, low latency responses, device heterogeneity, and
location/content awareness. The Fog can support the diversity of applications requirements in the
Cloud to Things (C2T) continuum, which is comprised of end devices, one or more levels of Fog
nodes, and the Cloud. Fog nodes located at the edge of the network are usually limited in
resources. Still, their use involves only brief delays in communication while the Cloud has a large
(“unlimited”) number of resources , but involves long delays in communication. On the lowest
level of this continuum, the initial processing can be carried out, and results passed on to a higher
layer in a Fog hierarchy, or to the Cloud itself for further processing.
Applications are usually composed of (dependent) tasks. The scheduling of tasks using C2T
resources is much more challenging than that of tasks on grids (confined systems) and on Clouds
(more homogeneous systems) due to the considerable heterogeneity of both demands of
applications and the capacity of the devices. Consequently, there is a need for schedulers to
analyze various parameters before making decisions as to where tasks and virtualized resources
should be run, including consideration of the availability of resources and their cost.
Objectives:
Classification algorithms are very important category of supervised machine learning
algorithms.
Algorithms like k-nearest, decision tree, Support Vector Machine and ANN with
combination of other algorithms.
To reduce the overall network delay for IoT devices responsible or collecting and
analyzing real-time manufacturing data
CHAPTER 2
LITERATURE SURVEY
Related Work:
Different studies have analyzed application requirements to develop service models for both
Cloud and Fog Computing. In Cloud computing, these studies have focused on Service Level
Agreements (SLA) and Quality of Experience (QoE) management, while in Fog Computing,
investigations have emphasized processing and analytics for specific applications , scheduling
of applications to resources, resource estimation and allocation for the processing of
applications, and service placement.
2. Wu et al. developed a Software as a Service (SaaS) broker for SLA negotiation. The
aim was to achieve the required service efficiently when negotiating with multiple
providers. The proposal involved the design of counter offer strategies and decision-
making heuristics which considered time, market constraints and trade-off between
QoS parameters. Results demonstrated that the proposed approach increases by 50%
the profit and by 60% the customer satisfaction level
5. Hobfeld et al. discussed the challenges for QoE provisioning for Cloud applications
with emphasis on multimedia applications. The authors also presented a QoE-based
classification scheme of Cloud applications aligned to the end-user experience and
usage domain
7. Cardellini et al. modified the Storm data stream processing system (DSP) to operate in
a geographically distributed and highly variable environment. To demonstrate the
effectiveness of the extended Storm system, the authors implemented a distributed
QoS aware scheduling algorithm for placing DSP applications near to the data sources
and the final consumers. The main limitation of this study is the instability of the
scheduling algorithm that affects negatively the application availability. Results
showed that the distributed QoS-aware scheduler outperforms the default centralized
one, improving the application performance.
CHAPTER 3
IMPLEMENTATION AND ARCHITECTURE
Fog computing aims at coping with these demands by hosting Cloud services on connected
heterogeneous devices, typically, but not exclusively located at the edge of the network . The Fog
provides a geographically distributed architecture for computation, communication, and storage,
which targets real-time applications and mobile services. End-users benefit from pre-processing
of workloads, geo-distribution of resources, low latency responses, device heterogeneity [8], and
location/content awareness. The Fog can support the diversity of applications requirements in the
Cloud to Things (C2T) continuum, which is comprised of end devices, one or more levels of Fog
nodes, and the Cloud.
Fog nodes located at the edge of the network are usually limited in resources. Still, their use
involves only brief delays in communication while the Cloud has a large (“unlimited”) number of
resources, but involves long delays in communication. On the lowest level of this continuum, the
initial processing can be carried out, and results passed on to a higher layer in a Fog hierarchy, or
to the Cloud itself for further processing .
Applications are usually composed of (dependent) tasks. The scheduling of tasks using C2T
resources is much more challenging than that of tasks on grids (confined systems) and on Clouds
(more homogeneous systems) due to the considerable heterogeneity of both demands of
applications and the capacity of the devices. Consequently, there is a need for schedulers to analyze
various parameters before making decisions as to where tasks and virtualized resources should be
run, including consideration of the availability of resources and their cost .
Fog Computing enables new applications, especially those with strict latency constraints and those
involving mobility. These new applications will have heterogeneous QoS requirements and will
demand Fog management mechanisms to cope efficiently with that heterogeneity. Thus, resource
management in Fog computing is quite challenging, calling for integrated mechanisms capable of
dynamically adapting the allocation of resources. A very first step in resource management is to
separate incoming flow of requests into Classes of Service (CoS) according to their QoS
requirements.
The mapping of applications into a set of classes of service is the first step in the creation of a
resource management system capable of coping with the heterogeneity of Fog applications. This
paper proposes various classes of service for Fog computing: Mission-critical, Real-time,
Interactive, Conversational, Streaming, CPU-bound, and Best-effort. These classes will be defined
and the typical applications using these classes identified
• The first CoS is the Mission-critical (MC) class. It comprises applications with a low event
to action time-bound, regulatory compliance, military-grade security, privacy, and
applications in which a component failure would cause a significant increase in the safety
risk for people and the environment. Applications include healthcare and hospital systems,
medical localization, healthcare robotics, criminal justice, drone operations, industrial
control, financial transactions, ATM banking systems, and military and emergency
operations.
• The Real-time (RT) class, on the other hand, groups applications requiring tight timing
constraints in conjunction with effective data delivery. In this case, the speed of response
in real-time applications is critical, since data are processed at the same time they are
generated. In addition to being delay sensitive, real-time applications often require a
minimum transmission rate and can tolerate a certain amount of data loss. This real-time
class includes applications such as online gaming, virtual reality, and augmented reality.
• The third class is denominated Interactive (IN). In this case, responsiveness is critical, the
time between when the user requests and actions manifested at the client being less than a
few seconds. Moreover, users of interactive applications can be end devices or individuals.
Examples of applications belonging to this class are interactive television, web browsing,
database retrieval, server access, automatic database inquiries by telemachines, pooling for
measurement collection, and some IoT deployments.
• The fourth class is the Conversational (CO) class. These applications include some of the
video and Voice-over-IP (VoIP). They are characterized by being delay-sensitive but loss-
tolerant with delays less than 150 milliseconds being perceived by humans, delays between
150 and 400 milliseconds can be acceptable, and those exceeding 400 milliseconds
resulting in completely unintelligible voice conversations.
• The fifth class of service is Streaming (ST), which releases the user to download entire
files, although in potentially long delays are incurred, before playout begins. Streaming
applications are accessed by users on demand and must guarantee interactivity and
continuous playout to the user. For this reason, the most critical performance measure for
streaming video is average throughput
• The sixth class is CPU-Bound (CB) class which is used by applications involving complex
processing models, such as those in decision making, which may demand hours, days, or
even months of processing. Face recognition, animation rendering, speech processing, and
distributed camera networks, are examples of CPU-Bound applications.
The below table shows the range of QoS requirement values for each class of service: Bandwidth
,Reliability , Security , Data storage , Data location , Mobility , Scalability , Delay sensitivity, Loss
sensitivity . These ranges are used to generate the synthetic dataset of Fog applications and are
employed for training and testing samples to evaluate the classifiers in this paper .
The possibility of having a hierarchical layered system is one of the significant differences between
Fog computing and edge computing. Edge computing is mainly concerned in bringing the
computation facilities closer to the user; however, in a flat non-hierarchical architecture . A layered
architecture can introduce additional communication overhead for processing tasks at different
layers. However, it has been shown that if the scheduling of tasks and resource reservations are
properly carried out, processing in a hierarchical architecture can reduce communication latency
and task waiting time for processing when compared to a flat architecture
In the scenario assumed in this paper, users subscribe directly or indirectly to Fog infrastructure
services. The first packet of a flow contains the QoS requirements of the application generating
the packet flow. The proposed classifier will then map this application into a CoS using the
information provided in the first packet. Alternatively, the first packet could already carry the CoS
of the application. However, such an option would make rigid the CoS adopted by the Fog
provider, preventing the redefinition of this CoS for the handling of new applications with unique
QoS requirements.
This methodology is for choosing and evaluating classifiers for Fog computing applications. It
provides a step-by-step procedure for grounded choices of classifiers. Indeed, this methodology
can be easily modified for the classification of applications in networked systems.
Classification techniques based on ML aim at mapping a set of new input data to a set of discrete
or continuous valued output. summarizes the key steps in the building of a classifier of Fog
applications based on ML algorithms. In this paper, the classification steps were executed offline.
Indeed, the best performing classifier evaluated in these steps can be executed on-line in an
operational Fog.
To train and test the classifiers employed in this paper, we built a dataset 1 composed of 14,000
mutually exclusive applications generated from data in the intervals of values acceptable for each
QoS requirement of the application. 90% of the data were reserved for training, while the
remaining 10% were used for testing. It was assumed that each incoming application had additional
fields containing nine QoS requirements, from now on referred to as “attributes”: Bandwidth,
Reliability, Security, Data Storage, Data location, Mobility, Scalability, Delay sensitivity, and
Loss sensitivity.
3.4.2. Pre-Processing
Z-score normalization was used to adjust attribute values defined on a different scale. Mean and
standard deviation were computed on the training set, and then, the same mean and standard
deviation were then used to normalize the testing set.
Two techniques can be used for dimensionality reduction: one, by using future selection techniques
such as Relief-F, CFS, MCFS, and the Student’s t-test, which rank the given feature set so that the
least significant features can be removed from the problem. The second way involves feature
extraction techniques, such as Principal Components Analysis (PCA), which creates new features
from the given feature set. The resulting number of the features is less than that the initial set of
features.
The correlation matrix shows that there is a statistical association of more than 50% between the
following variables: “Data storage” and “Data location” (0.623), “Data storage” and “Delay
sensitivity” (0.596), “Loss sensitivity” and “Mobility” (0.563), “Bandwidth” and “Scalability”
(0,605) and, “Data location” and “Delay sensitivity” (0.674). The symbol “-”in the correlation
value between the attributes of “Bandwidth” and “Scalability” indicates an inverse relationship
between the two.
3.4.3. Classification
Seven classifiers are evaluated for potential adoption: Adaptive neurofuzzy inference system from
data, using subtractive clustering (ANFIS), Decision Tree (DT) Artificial neural network with 2
hidden layers, trained with the Levenberg-Marquardt backpropagation algorithm, (ANN(1));
Artificial Neural Network with 1 hidden layer, trained with the algorithm Scaled conjugate gradient
backpropagation (ANN(2)); Artificial Neural Network with 2 hidden layers, trained with the
The performance of each classifier is evaluated under ideal conditions, that is, without noise. Each
classifier is assessed by measuring its accuracy and efficiency. Then the performance of each
classifier is assessed in the presence of noise. In this case, the robustness of each classifier in two
different scenarios is assessed: when noise is introduced into the training set, and when noise is
introduced in the testing set. In both cases, with and without noise, Wilcoxon’s signedrank test is
employed to determine whether or not the difference in accuracy between two classifiers is
statistically significant. Inserting noise in the training data set is designed to evaluate the robustness
of the classifier trained by the specific data set. Inserting noise in the test data set aims at assessing
the robustness of a classifier in the face of data sets with no evident data relationship, for which
data relationships are not so evident
• The average accuracy results for the testing process were close to 100%, except for
the ANFIS was only 99.2% accurate. The shortest prediction time was obtained
with the DT algorithm, while the KNN took the longest time. One possible
explanation is that the DT algorithm divides the input space, matching the way the
attributes were defined and assigned to each CoS by using intervals. The way the
synthetic dataset was created may have led to application instances with properties
which were easy to model, thus boosting the performance of the evaluated
classifiers.
Noise was introduced into the training partitions to create a noisy data set from the original, as
follows:
1. Noise as a percentage of the original value was introduced into a copy of the full original
dataset.
2. The two datasets, the original and the noisy copy, were partitioned into 10 equal folds, i.e.,
each with the same number of examples of each class.
3. The training partitions are built from the noisy copy, whereas the test partitions were
formed from examples from the noise-free dataset.
Noise was introduced into the testing partitions following the same steps described above, except
that in Step iii the testing partitions are built from the noisy copy, while the training partitions were
formed from examples from the noise-free data set.
After introducing noise, the accuracy of classifiers is determined by means of 5 runs of a stratified
10-Fold Cross-Validation (FCV). Hence, a total of 50 runs per dataset, noise type, and level are
averaged. Ten partitions make the noise effects more notable, since each partition has a large
number of examples (1,400). The robustness of each algorithm is then estimated by using the
Relative Loss of Accuracy (RLA) given by Equation 1 is:
where RLAx% is the relative loss of accuracy at a noise level of x%. Acc0% is the test accuracy
in the original case, that is, with 0% of induced noise, and Accx% is the test accuracy with a noise
level x%.
Next, the robustness of the classifiers when the noise has been introduced for the two mentioned
scenarios will be evaluated.
3.4.4.1 Classification using a training set with attribute noise and a clean
testing set :
• As can be observed in Table 5, the DT is the most robust classifier for all noise
levels. On the other hand, the ANN(1), ANN(2), and ANN(3) present high
robustness for noise levels (10-30%). Conversely, the RLA of classifiers based on
neural networks rises linearly to 9% when the noise level is 50%. The least robust
classifier is the ANFIS, for which the loss of accuracy increases exponentially as
the proportion of noise level rises, to the point that when the noise level is 50%, its
RLA is above 21% of that a clean dataset.
• Table shows the accuracy ratio and testing time when training takes place with both
clean datasets, and those disrupted by uniform attribute noise levels of 10%, 30%,
and 50%. A marker identifies each classification algorithm, and a different color
identifies each noise level. The light- bands indicate the areas of the greatest
accuracy or the slowest testing times, and the light-purple intersection of these
bands indicates the area where the best results for both accuracy and testing time
are found.
• The DT algorithm takes only 25 milliseconds for classification with the greatest
accuracy for up to 1,400 applications simultaneously arriving at the edge, when
training has taken place using datasets with a uniform attribute noise level of 50%.
Figure 3.3. Accuracy rates concerning the testing time for classifiers trained with both clean and
noisy datasets.
3.4.4.2 Classification using a clean training set and a testing set with
attribute noise
• Table 7 shows the results for average performance and robustness for each classification
algorithm at each noise level, from 0% to 50%, from the testing of datasets with uniform
attribute noise.
• As evinced in Table 7, for all classifiers, accuracy decreases exponentially with an increase
in the noise level of the testing dataset. In this situation, the most robust classifiers are
ANN(2), DT, ANN(3), ANN(1), and KNN.
• Fig. 8 illustrates the results of the classification algorithms when both accuracy and testing
time are considered with both clean and noise datasets (levels of 10%, 30%, and 50%). A
marker identifies each classification algorithm, and a different color identifies each noise
level. The light-blue bands indicate the areas of the greatest
accuracy rates or the slowest testing times, and the light-purple intersection of these
bands indicates the area where the best results were obtained when considering both
accuracy and testing time.
Figure 3.4. Accuracy rates concerning the testing time for classifiers tested with both clean and
noisy datasets.
CHAPTER 4
RESULTS AND DISCUSSION
The final step of the proposed methodology is the selection of a classifier. The results indicate that
the Decision Tree was the most accurate and robust classifier.
The decision tree algorithm does not need to assess all the attributes to classify an application since
various services have exclusive features. For example, mission-critical applications are the only
ones for which reliability takes on a “critical” value. Therefore, assessing certain features makes
classification a more efficient process. This is an attractive characteristic which makes Decision
Tree an ideal algorithm for the classification of applications in Fog computing.
Decision Tree:
Decision tree is a very powerful tool for classification and prediction. Due to its nature, it has been
widely used to represent the classification of models because of the advantage of creating a
coherent classification and accomplish the accuracy level. The goal of the decision tree is to find
the optimal decision by minimizing the generalization error by inducing algorithms that are
automatically constructed for a given dataset. Due to their non-parametric nature decision trees can
be applied either to classification or regression tasks. Partitioning the training data of reclassified
instances improve homogeneity by partitioning into smaller fragments or child partitions. The
decision tree uses the splitting criteria and various induction algorithms to calculate the variants of
impurity the entropy of splitting its child partitions. A new instance is classified with initialization
at the root of a decision tree after that the attribute to that specific node is tested. The outcome of
this test helps down the tree through the branch comparative to the attribute value of the given
instance repeating the process until a leaf is met.
CHAPTER 5
CONCLUSION
⚫ First, potential Fog computing applications are grouped in seven CoS according to their QoS
requirements. A synthetic database of Fog applications is built from the definition of the
intervals that each QoS requirement relevant for a specific Class..
⚫ the dataset is pre-processed to convert prior useless data into new data that can be used by ML
techniques. Next, a set of popular ML algorithms is selected and puts through the training and
testing processes, using the examples in the synthetic database to measure the degree of
accuracy and efficiency in their prediction of the CoS to which the application belongs. The
results SVM algorithm showed that the SVM algorithm has greater accuracy from kmeans,
Decision tree Random Forest, Naïve Bayes, and C4.5.
⚫ This ML-based classification methodology allows the implementation of CoS to manage the
traffic in Fog, which constitutes a first step in the definition of QoS provisioning mechanisms
in the C2T continuum.
⚫ For future work, an ML-based classification algorithm will be integrated into the Fog network
scheduler, thus enabling the network scheduler to prioritize processing requests. It will also
allow more delay sensitive demands to be satisfactorily fulfilled.
[1] P. Mell, T. Grance, The NIST Definition of Cloud Computing, Tech. Rep. NIST Special
Publication (SP) 800-145, National Institute of Standards and Technology (Sep. 2011).
[2] Open Fog Reference Architecture: OpenFog Consortium. Available:
https://www.openfogconsortium.org/ra/ [Accessed: 24/05/2017] (2017).
[3] N. Alkassab, C. T. Huang, Y. Chen, B. Y. Choi, S. Song, Benefits and schemes of
prefetching from cloud to fog networks, in: 2017 IEEE 6th International Conference on Cloud
Networking (CloudNet), 2017, pp. 1–5.
[4] L. F. Bittencourt, J. Diaz-Montes, R. Buyya, O. F. Rana, M. Parashar, Mobility-aware
application scheduling in fog computing, IEEE Cloud Computing 4 (2) (2017) 26–35.
[5] P. Hu, S. Dhelim, H. Ning, T. Qiu, Survey on fog computing: architecture, key
technologies, applications and open issues, Journal of Network and Computer Applications 98
(2017) 27–42.
[6] Kumari, S. Tanwar, S. Tyagi, N. Kumar, R. M. Parizi, K.-K. R. Choo, Fog data analytics:
A taxonomy and process model, Journal of Network and Computer Applications 128 (2019) 90–
104.
[7] C. C. Byers, Architectural Imperatives for Fog Computing: Use Cases, Requirements, and
Architectural Techniques for Fog-Enabled IoT Networks, IEEE Communications Magazine 55 (8)
(2017) 14–20. doi: 10.1109/MCOM.2017.1600885.
[8] F. Bonomi, R. Milito, J. Zhu, S. Addepalli, Fog Computing and Its Role in the Internet of
Things, in: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing,
MCC ’12, ACM, New York, NY, USA, 2012, pp. 13–16..
[9] T. Wang, Y. Liang, W. Jia, M. Arif, A. Liu, M. Xie, Coupling resource management based
on fog computing in smart city systems, Journal of Network and Computer Applications 135
(2019) 11–19
[10] R. Deng, R. Lu, C. Lai, T. H. Luan, H. Liang, Optimal workload allocation in fog-cloud
computing toward balanced delay and power consumption, IEEE Internet of Things Journal 3 (6)
(2016) 1171–1181.