Chapter Eight
Chapter Eight
Chapter Eight
Future Direction in data analytics for cyber security
School of Information Technology and Engineering
Addis Ababa Institute of Technology
Addis Ababa University
Jan 2025
sD
by senait Desalegn
Content
Cyber physical system
IoT VS CPS
Vulnerabilities and Security challenges to CPS
Multi Domain Mining
Deep Learning
Deep Learning Challenges
Generative Adversarial Networks
Ethical Thinking in the Data Analytics Process
sD
What is Cyber Physical Systems (CPS)?
D
Examples Of Cyber-Physical Systems
Autonomous Vehicles: Self-driving cars and trucks use sensors and AI for
navigation.
Smart Grids: The physical power grid infrastructure is interconnected with
digital systems for real-time monitoring and control.
Medical Devices: Implantable devices like pacemakers use sensors to monitor
and adjust therapy automatically. These devices integrate sensors and
actuators within the human body to maintain physiological functions.
Building Automation: Smart buildings use CPS to control HVAC, lighting, and
D
security systems.
Examples Of Cyber-Physical Systems
Autonomous Vehicles: Self-driving cars and trucks use sensors and AI for
navigation.
Smart Grids: The physical power grid infrastructure is interconnected with
digital systems for real-time monitoring and control.
Medical Devices: Implantable devices like pacemakers use sensors to monitor
and adjust therapy automatically. These devices integrate sensors and
actuators within the human body to maintain physiological functions.
Building Automation: Smart buildings use CPS to control HVAC, lighting, and
D
security systems.
Examples Of Cyber-Physical Systems
D
Key Features Of Cyber-Physical Systems
Cyber-physical systems (CPS) have distinct characteristics that set them apart
from computer systems and embedded devices. These features drive their
transformative impact across various domains:
Integration of Cyber and Physical Components: CPS are not merely software
running on hardware; they are integrated systems where the cyber and
physical components are deeply intertwined. This integration allows for real-
time interaction and feedback between the digital and physical worlds.
Real-Time Operation: CPS operates in real time and responds to changes in
D
operation.
Data-Driven Decision Making: CPS relies on data collected from sensors and
other sources. They use this data to make informed decisions and adapt to
IoT vs CPS
The Internet of Things (IoT) and Cyber-Physical Systems (CPS) have some
similarities, but there are key distinctions in their capabilities and applications.
Internet-of-Things (IoT)
Focus: IoT connects everyday objects to the internet, to collect and exchange
data.
Functionality: Involves simple tasks like sensing and transmitting data, such as
a smart thermostat adjusting temperature based on occupancy.
Control: Limited control over the physical environment, focusing more on data
collection and communication.
sensors
IoT vs CPS
D
Vulnerabilities and security challenges to CPS
Complexity: CPS are complex, with diverse components and intricate
interactions, making them difficult to secure. Attackers can find vulnerabilities
in individual components or in the communication networks that connect
them.
Connectivity: The increasing connectivity of CPS exposes them to a wider
range of cyber threats. Attackers can gain access to CPS through the internet
or other networks, causing significant damage.
Legacy Systems: Many infrastructures rely on legacy OT systems that were not
designed with cyber security in mind. These outdated systems are often
D
poorly patched and lack basic security features, making them easy targets for
attackers.
Temporal Evaluation of a sensor network
Data analytics in Cyber Physical Systems
(CPS)?
Sensor networking and its applications can be adapted to
Industrial Control Systems (ICS) and computer networks for
supporting cyber security
Let us consider a part of the sensor network where the nodes
A, B, C, D
We evaluate the nodes based on connections or number of
links in (a)
Here nodes A and C can be considered important as they are
connecting hubs, where a large number of edges are incident,
indicating a high degree of communication
Similarly node B can be considered important because it is a
connector between the two hubs
Data analytics in Cyber Physical Systems
(CPS)?
However, in (b), we see a different scenario, where it is difficult
to determine the clear importance based on links
Thus, simply using metrics such as centrality, focusing on the
number of edges incident on a node, may not be the best or
even feasible
Evaluating the critical nodes based on their behavior over time
is much more useful here
This behavior can be captured in terms of relationship based on
edges between nodes, which could represent data transfers
between nodes, such as in a computer network or an ICS
network. For example, in (c), the data transfers at time t1, t2, t3
for the network in (b) are shown
Data analytics in Cyber Physical Systems
(CPS)?
These can now be used for link based patterns, or node based
patterns to produce a ranking of links and nodes by importance, in
terms of how many times they appear in the temporal windows as
shown in (d)
These can be mined in an association rule based method
Certain time periods may be more important and thus, may be given
more weight, which may be mined using quantitative association rule
based methods
Internet of Things(IoT)
Internet-Of-things (IoT) is interchangeably used and considered as
complimentary to CPS in some studies
IoT has indeed become a big share of the CPS space due to the
explosion of the number of devices and advancements of smart
devices and sensors
One such view point is that of a geospatial data which can enhance the
knowledge of the IP session or even the IP reputation score itself
Multi Domain Mining: Integrating
multiple heterogeneous data
Current reputation systems pursue classification into a white and black
list, i.e., binary categorization
Separate lists for URLs and IP addresses are maintained
Some tools that provide
include Cisco SenderBase (https://www.senderbase.org/), VirusTotal IP
reputation (https://www.virustotal.com/) and Spam and Open Relay
Blocking System (SORBS) (http://www.sorbs.net/)
Most of these tools and lists are based on single dimensional features
with no correlation among them
Such shortcoming degrades a system’s effectiveness for detecting
sophisticated attacks and terminating malicious activities
Multi Domain Mining: Integrating
multiple heterogeneous data
However, the set of attributes that the reputation scoring considers
can be enriched, providing an expressive scoring system that enables
an administrator to understand what is at stake, and increasing
robustness by correlating the various pieces of information while
factoring in the trustworthiness of their sources
Multi Domain Mining: Integrating multiple
heterogeneous data-IP Reputation scoring
IP reputation scoring model can be enriched using network session
features and geo-contextual features such that the incoming session IP
is labeled based on most similar IP addresses, both in terms of
network features and geo-contextual features
This can provide better threat assessment by considering not only the
network features but also additional context, such as the geospatial
context information collected from external sources
These scenarios not only impact the countries facing such cyber
security crises but also impact other countries and end users due to
the level of connectivity in today’s day and age
Studies have also identified regions across the world which are prone
to hosting certain types of attacks
Multi Domain Mining: Integrating multiple
heterogeneous data-IP Reputation scoring
For example, studies have indicated that Trojans, worms and viruses
are most prevalent in Sub-Saharan Africa ,some families of malware
preferentially target Europe and US Yet other studies have explained
broad categories of worldwide systemic risks and country-specific risks
where country specific risks include aspects of economy, technology,
industry and international cooperation in enforcement of laws and
policies
Multi Domain Mining: Integrating multiple
heterogeneous data
Geospatial data not only provides additional context but provides
a framework to accommodate additional geo-political information
which often plays a big role in hactivism or politically inspired
attacks. The figure provides a set of rich sources to access
geospatial data for countries and in some cases even at a granular
level of cities
There are very few systems which address the issue of ‘zero
day’ attacks where the attack signature is not known before-
hand
Integrated alerts from multiple sources
.
Training & Prediction of Generative
Adversarial Networks (GANs)
Step 3: Train Discriminator on Real Dataset
Now Discriminator is trained on a real dataset. It is only having a
forward path, no back propagation is there in the training of the
Discriminator in n epochs. And the Data you are providing is without
Noise and only contains real images, and for fake images,
Discriminator uses instances created by the generator as negative
output. Now, what happens at the time of discriminator training.
It classifies both real and fake data.
The discriminator loss helps improve its performance and penalize
it when it misclassifies real as fake or vice-versa.
weights of the discriminator are updated through discriminator
loss.
Training & Prediction of Generative
Adversarial Networks (GANs)
Step 4: Train Generator
Provide some Fake inputs for the generator(Noise) and It will use
some random noise and generate some fake outputs. when Generator
is trained, Discriminator is Idle and when Discriminator is trained,
Generator is Idle. During generator training through any random noise
as input, it tries to transform it into meaningful data. to get
meaningful output from the generator takes time and runs under
many epochs. steps to train a generator are listed below
• get random noise and produce a generator output on noise sample
• predict generator output from discriminator as original or fake.
• we calculate discriminator loss.
•
Training & Prediction of Generative
Adversarial Networks (GANs)
• perform back propagation through discriminator, and generator
both to calculate gradients.
• Use gradients to update generator weights.
Step 5: Train Discriminator on Fake Data
The samples which are generated by Generator will pass to
Discriminator and It will predict the data passed to it is Fake or real
and provide feedback to Generator again.
Training & Prediction of Generative
Adversarial Networks (GANs)
6.Train Generator with the output of Discriminator
Again Generator will be trained on the feedback given by
Discriminator and try to improve performance.
This is an iterative process and continues running until the Generator
is not successful in making the discriminator fool.
Training & Prediction of Generative
Adversarial Networks (GANs)
Generative Adversarial Networks (GAN)
.
Generative Adversarial Networks (GAN)
.We can see that the two components are pitted against each
other. Here an input (combination of noise and random images)
is provided to the generator, which generates samples
On the other hand the discriminator which is trained on the real
world images examines these generated samples
These samples are labelled as real or fake. In addition, the
discriminator also learns from the loss of labelling the samples
and corrects the weights in the discriminator
Generative Adversarial Networks (GAN)
.
Ethical Thinking in the Data Analytics Process
eEthics in data analytics involves more than just following legal guidelines; it
encompasses a commitment to fairness, transparency, accountability, and
respect for the privacy and rights of individuals. Without a strong ethical
framework, data analytics can lead to biased outcomes, discrimination, and a
loss of trust among stakeholders.se techniques enhance the overall security
posture by complementing traditional security measures and providing a
proactive approach to threat detection and mitigation
Best Practices for Ethical Data Use