0% found this document useful (0 votes)

30 views17 pages

Unit 1 ML

Uploaded by

mohamedasif350

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views17 pages

Unit 1 ML

Uploaded by

mohamedasif350

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

MODULE I

Machine learning is so critical to cyber security because, it can make cyber security simpler,
more effective, less expensive, and proactive. It is a sub-field that comes under Artificial
Intelligence (AI) Machine learning means, computers changing the way they do a task by
learning from new data automatically without the help of humans. With the help of machine
learning, cyber security systems can analyze patterns and helps to prevent cyber-attacks. It is
about developing patterns and manipulating those patterns with algorithms. Machine learning is
based on some patterns that are capable of making new predictions according to the new data,
like a shopping application that provides you with many recommendations based on your
previous views.
Benefits of Machine Learning

Machine learning can be used in various domains within cyber security to increase the security
process and make it easier for a security analyst to identify, prioritize, and deal with new attacks.

● Automating task
● Malware recognition and categorizing
● Phishing
● WebShell

Why has machine learning become so critical to cybersecurity?

With machine learning, cyber security systems can analyze patterns and learn from them to help
prevent similar attacks and respond to changing behavior. It can help cybersecurity teams be
more proactive in preventing threats and responding to active attacks in real time. It can reduce
the amount of time spent on routine tasks and enable organizations to use their resources more
strategically.
In short, machine learning can make cyber security simpler, more proactive, less expensive and
far more effective. But it can only do those things if the underlying data that supports the
machine learning provides the complete picture of the environment.
The need for machine learning has to do with complexity. Many organizations today possess a
growing number of Internet of Things (IoT) devices that aren’t all known or managed by IT. All

data and applications aren’t running on-premises, as hybrid and multicloud are the new normal.
Users are no longer mostly in the office, as remote work is widely accepted.

AI and Machine Learning in Cybersecurity:

AI cybersecurity, with the support of machine learning, is set to be a powerful tool in the
looming future. As with other industries, human interaction has long been essential and
irreplaceable in security. While cybersecurity currently relies heavily on human input, we are
gradually seeing technology become better at specific tasks than we are.

Among these developments, a few areas of research are at the core of it all:

● Artificial intelligence (AI) is designed to give computers the full responsive ability of the
human mind. This is the umbrella discipline under which many others fall, including machine
learning and deep learning.
● Machine learning (ML) uses existing behavior patterns, forming decision-making based on past
data and conclusions. Human intervention is still needed for some changes. Machine learning is
likely the most relevant AI cybersecurity discipline to date.
● Deep learning (DL) works similarly to machine learning by making decisions from past patterns
but makes adjustments on its own. Deep learning in cybersecurity currently falls within the scope
of machine learning, so we’ll focus mostly on ML here.

Top 5 Applications of Machine Learning in Cyber Security

Cyber security is a critical part of any company. Not only companies but even governments need
top-class cyber security to make sure that their data remains private and is not hacked or leaked
for all the world to see! And with the increasing popularity of Artificial Intelligence and Machine
Learning, these technologies are even becoming key players in the field of cyber security.
Machine Learning has many applications in Cyber Security including identifying cyber threats,
improving available antivirus software, fighting cyber-crime that also uses AI capabilities, and so
on.

1. Cyber Threat Identification

Cyber security is a very important component of all companies. After all, if a hacker manages to
enter their systems, they are toast! The most difficult component of cyber security is finding out
if the connection requests into the system are legitimate and any suspicious looking activities
such as receiving and sending large amounts of data are the work of professionals in the
company or some cyber threats. This is very difficult to identify for cyber security professionals,
especially in large companies where requests range in the thousands all the time and humans are
not always accurate. That’s where machine learning can provide a lot of help to professionals. A
cyber threat identification system that is powered by AI and ML can be

used to monitor all outgoing and incoming calls as well as all requests to the system to monitor
suspicious activity. For example, Versive is an artificial intelligence vendor that provides cyber
security software in conjugation with AI.
2. AI-based Antivirus Software

It is commonly recommended to install Antivirus before using any system. This is because
antivirus protects your system by scanning any new files on the network to identify if they might
match with a known virus or malware signature. However, this traditional antivirus requires
constant upgrades to keep up with all the upgrades in the new viruses and malware being created.
That’s where machine learning can be extremely helpful. Antivirus software that is integrated
with machine learning tries to identify any virus or malware by its abnormal behavior rather than
its signature. In this way, it can manage threats that are common and previously encountered and
also new threats from viruses or malware that were recently created. For example, Cylance a
software company has created a smart antivirus that learns how to detect viruses or malware
from scratch and thus does not depend on identifying their signatures to detect them.

3. User Behavior Modeling

Some cyber threats can attack a particular company by stealing the login credentials of any of
their users and then illegally logging into the network. This is very difficult to detect by normal
antivirus as the user credentials are authentic and the cyberattack may even happen without
anyone knowing. Here, machine learning algorithms can provide help by using user behavior
modeling. The machine learning algorithm can be trained to identify the behavior of each user
such as their login and logout patterns. Then any time a user behaves out of their normal
behavioral method, the machine learning algorithm can identify it and alert the cybersecurity
team that something is out of the ordinary. Of course, some changes in user behavior patterns and
entirely natural but this will still help in catching more cyber threats than conventional methods.
For example, there is a cybersecurity software provided by Darktrace that uses machine
learning to identify the normal behavioral patterns of all the users in a system by analyzing the
network traffic information.

4. Fighting AI Threats

Many hackers are now taking advantage of technology and using machine learning to find the
holes in security and hack systems. Therefore, it is very important that companies fight fire with
fire and use machine learning for cybersecurity as well. This might even become the standard
protocol for defending against cyber attacks as they become more and more tech-savvy. Take
into account the devastating NotPetya attack that utilized EternalBlue, a software hole in
Microsoft’s Windows OS. These types of attacks can get even more devastating in the future
with the help of artificial intelligence and machine learning unless cybersecurity software also
uses the same technology. An example of this is Crowdstrike, a cybersecurity technology
company that uses Falcon Platform which is a security software imbued with artificial
intelligence to handle various cyber attacks.

5. Email Monitoring

It is very important to monitor the official Email accounts of employees in a company to prevent
cybersecurity attacks such as phishing. Phishing attacks can be done by sending fraudulent
Emails to employees and asking them for private information such as sensitive information
related to their job, their banking and credit card details, company passwords, etc. Cybersecurity
software along with machine learning can be used to avoid these phishing traps by monitoring
the employees’ professional emails to check if any features indicate a cybersecurity threat.
Natural language processing can also be used to scan the Emails and see if there is anything
suspicious such as some patterns and phrases that may indicate that the Email is a phishing
attempt. For example, Tessian is a famous software company that provides Email monitoring
software that can be used to check if an email is a phishing attempt or a data breach. This is done
using natural language processing and anomaly detection technologies to identify threats.

Future of Machine Learning and Cybersecurity

Machine learning is still a comparatively new addition to the field of cybersecurity. However, the
above given 5 applications of Machine Learning in Cybersecurity are a good start in this field.
The only thing to keep in mind is that machine learning algorithms should minimize their false
positives i.e. actions that they identify as malicious or part of a cyber attack but that are not.
Companies need to ensure that they consult with their cybersecurity specialists who can provide
the best solutions in identifying and handling new and different types of cyber attacks with even
more precision using machine learning.
How Machine Learning Enables the Future of Cybersecurity
Machine learning supports modern cybersecurity solutions in a number of different ways.
Individually, each one is valuable, and together they are game-changing for maintaining a strong
security posture in a dynamic threat landscape.

Identification and profiling: With new devices getting connected to enterprise networks all the
time, it’s not easy for an IT organization to be aware of them all. Machine learning can be used to
identify and profile devices on a network. That profile can determine the different features and
behaviors of a given device.

Automated anomaly detection: Using machine learning to rapidly identify known bad
behaviors is a great use case for security. After first profiling devices and understanding regular
activities, machine learning knows what’s normal and what’s not.

Zero-day detection: With traditional security, a bad action has to be seen at least once for it to
be identified as a bad action. That’s the way that legacy signature-based malware detection
works. Machine learning can intelligently identify previously unknown forms of malware and
attacks to help protect organizations from potential zero-day attacks.

Policy recommendations: The process of building security policies is often a very manual effort
that has no shortage of challenges. With an understanding of what devices are present and what
is normal behavior, machine learning can help to provide policy recommendations for security
devices, including firewalls. Instead of having to manually navigate around different conflicting
access control lists for different devices and network segments, machine learning can make
specific recommendations that work in an automated approach.

With more devices and threats coming online every day, and human security resources in scarce
supply, only machine learning can sort complicated situations and scenarios at scale to enable
organizations to meet the challenge of cyber security now and in the years to come.

Cyber Threat Landscape:

The threat landscape means the entire scope of potential and recognized cyber security threats
affecting user groups, organizations, specific industries, or a particular time.

As new cyber threats emerge daily, the threat landscape changes accordingly. The main factors
contributing to the dynamic threat landscape include:

● Increasingly sophisticated tools and attack methods;

● Greater reliance on information technology products and services, such as SaaS offerings;
● Networks that encourage and enable the distribution of cybercrime profits, such as the
dark web;
● Greater availability of skills, personnel, and finances to drive cyber attacks;
● External factors, such as a global pandemic or financial crisis;
● Faster software releases with added functionality;
● New hardware development, such as Internet of Things (IoT) devices.

What’s included in the threat landscape

The threat landscape is usually thought of as including the vulnerabilities, malware, and specific
groups of attackers and their techniques that represent a danger in a given context.
By “context,” we mean the specifics of a particular sector, organization, or even individual,
including the following (among many more):

● Possession of information of value to attackers;

● Security level;
● Geopolitical factors (some threats, APTs in particular, target organizations or people based in a
particular country or region).

The threat landscape changes both over time and as a result of events with a significant impact
on the organization, group of people, or sector for which the threat landscape is defined. For
example, as a result of 2020’s large-scale shift to work from home, attacks
targeting remote-access tools have surfaced on many companies’ threat landscapes. The
following factors, among others, influence the threat landscape:

● The emergence and discovery of vulnerabilities that provide cybercriminals with new attack
opportunities.
● The release of new software versions with additional functionality.
● The development of new hardware platforms, as well as the emergence of new approaches to data
processing, such as the use of cloud services or edge computing.
● Global events such as the COVID-19 pandemic compelling organizations to make major changes
to their infrastructure.

How to Protect Against the Threat Landscape

Here are three ways to protect your organization against the threat landscape:

1. Understand the Different Types of Threats

2. Gain Visibility Over Your Attack Surface

3. Use Defensive Measures

AI & ML:

Artificial intelligence is a field of computer science which makes a computer system that can
mimic human intelligence. It is comprised of two words "Artificial" and "intelligence", which
means "a human-made thinking power."

The Artificial intelligence system does not require to be pre-programmed, instead of that, they
use such algorithms which can work with their own intelligence. It involves machine learning
algorithms such as Reinforcement learning algorithm and deep learning neural networks. AI is
being used in multiple places such as Siri, Googles AlphaGo, AI in Chess playing, etc.

AI services can be classified into Vertical or Horizontal AI

What is Vertical AI?

These are services focus on the single job, whether that’s scheduling meeting, automating
repetitive work, etc. Vertical AI Bots performs just one job for you and do it so well, that we
might mistake them for a human.

What is Horizontal AI?

These services are such that they are able to handle multiple tasks. There is no single job to be
done. Cortana, Siri and Alexa are some of the examples of Horizontal AI. These services work
more massively as the question and answer settings, such as “What is the temperature in New
York?” or “Call Alex”. They work for multiple tasks and not just for a particular task entirely.

Machine learning

Machine learning is about extracting knowledge from the data. Machine learning enables a
computer system to make predictions or take some decisions using historical data without being
explicitly programmed. Machine learning uses a massive amount of structured and
semi-structured data so that a machine learning model can generate accurate result or give
predictions based on that data.

It can be divided into three types:

● Supervised learning
● Unsupervised learning
● Reinforcement Learning
Supervised Learning

In supervised learning, training datasets are provided to the system. Supervised learning
algorithms analyse the data and produce an inferred function.

Unsupervised Learning

Unsupervised Learning algorithms are much harder because the data to be fed is unclustered
instead of datasets. Here the goal is to have the machine learn on its own without any
supervision. The correct solution of any problem is not provided. The algorithm itself finds the
patterns in the data.

Reinforcement Learning

This type of Machine Learning algorithms allows software agents and machines to automatically
determine the ideal behaviour within a specific context, to maximise its performance.
Reinforcement learning is defined by characterising a learning problem and not by characterising
learning methods.

Real World uses of ML in Security

Real Life Examples

Machine learning can quickly scan large amounts of data and analyze it using statistics. Modern
organizations generate huge amounts of data, so it’s no wonder the technology is such a useful
tool.

1.Using machine learning to detect malicious activity and stop attacks

Machine learning algorithms will help businesses to detect malicious activity faster and stop
attacks before they get started. David Palmer should know. As director of technology at
UK-based start-up Darktrace – a firm that has seen a lot of success around its machine
learning-based Enterprise Immune Solution since the firm’s foundation in 2013 – he has seen the
impact on such technologies.

2.Using machine learning to analyze mobile endpoints

Machine learning is already going mainstream on mobile devices, but thus far most of this
activity has been for driving improved voice-based experiences on the likes of Google Now,
Apple’s Siri, and Amazon’s Alexa. Yet there is an application for security too. As mentioned
above, Google is using machine learning to analyze threats against mobile endpoints, while
enterprise is seeing an opportunity to protect the growing number of bring-your-own and
choose-your-own mobile devices.

3.Using machine learning to enhance human analysis

At the heart of machine learning in security, there is the belief that it helps human analysts with
all aspects of the job, including detecting malicious attacks, analyzing the network, endpoint
protection and vulnerability assessment. There’s arguably most excitement though around threat
intelligence.

4.Using machine learning to automate repetitive security tasks

The real benefit of machine learning is that it could automate repetitive tasks, enabling staff to
focus on more important work. Palmer says that machine learning ultimately should aim to
“remove the need for humans to do repetitive, low-value decision-making activity, like triaging
threat intelligence. “Let the machines handle the repetitive work and the tactical firefighting like
interrupting ransomware so that the humans can free up time to deal with strategic issues — like
modernizing off Windows XP — instead.”

5.Using machine learning to close zero-day vulnerabilities

Some believe that machine learning could help close vulnerabilities, particularly zero-day threats
and others that target largely unsecured IoT devices.

Classifying and Clustering:

Both Classification and Clustering is used for the categorization of objects into one or more
classes based on the features. They appear to be a similar process as the basic difference is
minute. In the case of Classification, there are predefined labels assigned to each input instance
according to their properties whereas in clustering those labels are missing.
Differences between Classification and Clustering

Classification is used for supervised learning whereas clustering is used for unsupervised
learning.

The process of classifying the input instances based on their corresponding class labels is known
as classification whereas grouping the instances based on their similarity without the help of
class labels is known as clustering.
As Classification have labels so there is need of training and testing dataset for verifying the
model created but there is no need for training and testing dataset in clustering.

Classification is more complex as compared to clustering as there are many levels in the
classification phase whereas only grouping is done in clustering.

Classification examples are Logistic regression, Naive Bayes classifier, Support vector machines,
etc. Whereas clustering examples are k-means clustering algorithm, Fuzzy c-means clustering
algorithm, Gaussian (EM) clustering algorithm, etc.

Comparison between Classification and Clustering:

Parameter CLASSIFICATION CLUSTERING

Type used for supervised learning used for unsupervised learning

process of classifying the input grouping the instances based on their

instances based on their similarity without the help of class
Basic corresponding class labels labels

it has labels so there is need of

training and testing dataset for there is no need of training and testing
Need verifying the model created dataset

more complex as compared to less complex as compared to

Complexity clustering classification

k-means clustering algorithm, Fuzzy

Logistic regression, Naive Bayes c-means clustering algorithm,
Example classifier, Support vector machines, Gaussian (EM) clustering algorithm,
Algorithms etc. etc.

Machine Learning: Problems and Approaches

Machine Learning provides businesses with the knowledge to make more informed, data-driven
decisions that are faster than traditional approaches. However, it's not the mythical, magical
process many build it up to be. Machine Learning presents its own set of challenges. Here are
five common machine learning problems and how you can overcome them.

1) Understanding Which Processes Need Automation

It's becoming increasingly difficult to separate fact from fiction in terms of Machine Learning
today. Before you decide on which AI platform to use, you need to evaluate which problems
you’re seeking to solve. The easiest processes to automate are the ones that are done manually
every day with no variable output. Complicated processes require further inspection before
automation. While Machine Learning can definitely help automate some processes, not all
automation problems need Machine Learning.

2) Lack of Quality Data

The number one problem facing Machine Learning is the lack of good data. While enhancing
algorithms often consumes most of the time of developers in AI, data quality is essential for the
algorithms to function as intended. Noisy data, dirty data, and incomplete data are the
quintessential enemies of ideal Machine Learning. The solution to this conundrum is to take the
time to evaluate and scope data with meticulous data governance, data integration, and data
exploration until you get clear data. You should do this before you start.

3) Inadequate Infrastructure
Machine Learning requires vast amounts of data churning capabilities. Legacy systems often
can’t handle the workload and buckle under pressure. You should check if your infrastructure can
handle Machine Learning. If it can’t, you should look to upgrade, complete with hardware
acceleration and flexible storage.
4) Implementation

Organizations often have analytics engines working with them by the time they choose to
upgrade to Machine Learning. Integrating newer Machine Learning methodologies into existing
methodologies is a complicated task. Maintaining proper interpretation and documentation goes
a long way to easing implementation. Partnering with an implementation partner can make the
implementation of services like anomaly detection, predictive analysis, and ensemble modeling
much easier.

5) Lack of Skilled Resources

Deep analytics and Machine Learning in their current forms are still new technologies. Thus,
there is a shortage of skilled employees available to manage and develop analytical content for
Machine Learning. Data scientists often need a combination of domain experience as well as
in-depth knowledge of science, technology, and mathematics. Recruitment will require you to
pay large salaries as these employees are often in high-demand and know their worth. You can
also approach your vendor for staffing help as many managed service providers keep a list of
skilled data scientists to deploy anytime.

Examples of Machine Learning Models

A machine learning model is a program that can find patterns or make decisions from a
previously unseen dataset.

1. Image recognition
Image recognition is a well-known and widespread example of machine learning in the real
world. It can identify an object as a digital image, based on the intensity of the pixels in black
and white images or color images.

Real-world examples of image recognition:

● Label an x-ray as cancerous or not

● Assign a name to a photographed face (aka “tagging” on social media)
● Recognise handwriting by segmenting a single letter into smaller images

Machine learning is also frequently used for facial recognition within an image. Using a database
of people, the system can identify commonalities and match them to faces. This is often used in
law enforcement.

2. Speech recognition

Machine learning can translate speech into text. Certain software applications can convert live
voice and recorded speech into a text file. The speech can be segmented by intensities on
time-frequency bands as well.

Real-world examples of speech recognition:

● Voice search
● Voice dialling
● Appliance control
● Some of the most common uses of speech recognition software are devices like Google
Home or Amazon Alexa.

3. Medical diagnosis

Machine learning can help with the diagnosis of diseases. Many physicians use chatbots with
speech recognition capabilities to discern patterns in symptoms.

Real-world examples for medical diagnosis:

● Assisting in formulating a diagnosis or recommends a treatment option

● Oncology and pathology use machine learning to recognise cancerous tissue
● Analyse bodily fluids

4. Statistical arbitrage
Arbitrage is an automated trading strategy that’s used in finance to manage a large volume of
securities. The strategy uses a trading algorithm to analyse a set of securities using economic
variables and correlations.

Real-world examples of statistical arbitrage:

● Algorithmic trading which analyses a market microstructure

● Analyse large data sets
● Identify real-time arbitrage opportunities
● Machine learning optimises the arbitrage strategy to enhance results.

5. Predictive analytics

Machine learning can classify available data into groups, which are then defined by rules set by
analysts. When the classification is complete, the analysts can calculate the probability of a fault.

Real-world examples of predictive analytics:

● Predicting whether a transaction is fraudulent or legitimate

● Improve prediction systems to calculate the possibility of fault
● Predictive analytics is one of the most promising examples of machine learning. It's
applicable for everything; from product development to real estate pricing.

6. Extraction

Machine learning can extract structured information from unstructured data. Organisations amass
huge volumes of data from customers. A machine learning algorithm automates the process of
annotating datasets for predictive analytics tools.

Real-world examples of extraction:

● Generate a model to predict vocal cord disorders

● Develop methods to prevent, diagnose, and treat the disorders
● Help physicians diagnose and treat problems quickly
● Typically, these processes are tedious. But machine learning can track and extract
information to obtain billions of data samples.

Model Families in Machine Learning:

The four families

● Anomaly detection
● Classification
● Clustering
● Regression

Which one to use?

Anomaly detection => identify unusual patterns.

Classification => Identify based on a label or category

Clustering => Identify by grouping your data

Regression => Identify the likelihood of a relationship between data

Business Examples

Anomaly detection

Detecting bank card fraud based on amount that has been spent by looking for an anomaly.

Classification

Detecting if the mail you have received is spam or non spam. There will be two categories, and a
likelihood of a mail being either spam or non spam.

Clustering

Let’s say we have a data set of longitude and latitude of accidents happened in a specific country.

With a clustering model we could create clusters to see, in which part of the country, most of the
accidents happen.

Regression

If you want to calculate approx how many minutes a train would be late, then the regression
model is best suited for that specific use case.

Loss Function:

The Loss function is a method of evaluating how well your algorithm is modeling your dataset. It
is a mathematical function of the parameters of the machine learning algorithm. “The function
we want to minimize or maximize is called the objective function or criterion. When we are
minimizing it, we may also call it the cost function, loss function, or error function”
In mathematical optimization and decision theory, a loss or cost function (sometimes also called
an error function) is a function that maps an event or values of one or more variables onto a real
number intuitively representing some “cost” associated with the event. A loss function is a
measure of how good your prediction model does in terms of being able to predict the
expected outcome(or value). We convert the learning problem into an optimization
problem, define a loss function and then optimize the algorithm to minimize the loss
function.

Loss Function:

A loss function/error function is for a single training example/input.

Cost Function:

A cost function, on the other hand, is the average loss over the entire training dataset.

Example:

Assume you are given a task to fill a bag with 10 Kg of sand. You fill it up till the measuring
machine gives you a perfect reading of 10 Kg or you take out the sand if the reading exceeds
10kg.

Just like that weighing machine, if your predictions are off, your loss function will output a
higher number. If they’re pretty good, it’ll output a lower number. As you experiment with your
algorithm to try and improve your model, your loss function will tell you if you’re getting(or
reaching) anywhere.
What are the types of loss functions?

Most commonly used loss functions are:

● Mean Squared error

● Mean Absolute Error

● Log-Likelihood Loss
● Hinge Loss
● Huber Loss

Optimization:

Machine learning optimization is the process of iteratively improving the accuracy of a machine
learning model, lowering the degree of error. Most machine learning models use training data to
learn the relationship between input and output data. The models can then be used to make
predictions about trends or classify new input data. This training is a process of optimization, as
each iteration aims to improve the model’s accuracy and lower the margin of error.

Machine learning optimization is the process of adjusting the hyperparameters in order to

minimize the cost function by using one of the optimization techniques. It is important to
minimize the cost function because it describes the discrepancy between the true value of the
estimated parameter and what the model has predicted.