Real Time Network Traffic Analysis Using Artificial Intelligence Machine Learning and Deep Learning A Review of Methods Tools and Applications
Real Time Network Traffic Analysis Using Artificial Intelligence Machine Learning and Deep Learning A Review of Methods Tools and Applications
net/publication/376287072
CITATION READS
1 56
5 authors, including:
All content following this page was uploaded by Mohana .. on 23 December 2023.
Abstract- In order to spot potential security threats or performance artificial intelligence (AI) provide up new avenues for real-time
problems, Network Traffic Analysis (NTA) involves monitoring network traffic analysis. Without the requirement for pre-defined
and analyzing network traffic. However, Machine Learning (ML) signatures, harmful traffic patterns can be recognized by AI-
methods are frequently used to automate NTA. Network traffic powered traffic analysis tools. They are hence more potent
classification, anomaly detection, and malicious activity detection
can all be done using ML techniques. In order to enhance network
against emerging and changing threats. real-time NTA is a
performance, they can also be utilized to forecast future traffic difficult undertaking reasons are first; network traffic is
patterns. ML algorithms come in a variety of forms and can be frequently both highly fast and big volume. Because of this,
applied to NTA. Support vector machines (SVM), decision trees, gathering and analyzing all of the traffic in real time is
and random forests are the most used methods. Depending on the challenging. Secondly, a lot of network traffic is encrypted. This
particular application, an algorithm will be chosen. SVMs, for can make it challenging to detect malicious activity since it
instance, are frequently used for classification tasks, whereas makes it harder to view the substance of the traffic. Third, traffic
decision trees are frequently utilized for anomaly detection jobs. on networks is ever-changing. This implies that in order to stay
Network performance and security can be enhanced using NTA abreast of the most recent dangers, the models used for traffic
employing ML. It can aid in the detection of possible risks, the
prevention of data breaches, and the enhancement of network
analysis must be updated on a regular basis. Real-time NTA
performance. In the proposed work real time NTA using ML and exhibits a number of difficulties that can be addressed by AI and
DL algorithms discussed with tools and applications. Random deep learning. First, without the need for pre-defined signatures,
forest algorithm is implemented and obtained an accuracy of AI-powered traffic analysis systems can utilize machine learning
99.31%. Benefits of applying ML to NTA includes increased to learn to recognize harmful traffic patterns. They are hence
accuracy when it comes to spotting dangers and anomalies, ML more potent against emerging and changing threats. Second,
algorithms have the potential to be more precise than conventional deep learning can be used by AI-powered traffic analysis systems
rule-based techniques. Less false positives, ML algorithms can be to decipher encrypted traffic. DL models have the capacity to
customized to produce fewer false positives, which can save time recognize encrypted traffic patterns linked to harmful behavior.
and money. Enhanced scalability, ML algorithms can be scaled to
manage high levels of network traffic.
Thirdly, compared to conventional traffic analysis tools, AI-
powered solutions can receive updates more often. Due to the
Keywords — You Only Look Once (YOLO), Convolutional Neural ease and speed with which AI models can be trained on new data
Network (CNN), Artificial Intelligence(AI), Machine learning sets.
(ML) , Deep Learning(DL)
II. LITERATURE SURVEY AND PROBLEM ANALYSIS
I. INTRODUCTION
Neeraj Namdev et al [1] presented the use of the internet is
The process of gathering, examining, and interpreting network playing a huge role as technology and communication systems
traffic data in order to spot anomalies, performance problems, evolve. As a result, data and traffic through the internet rise
and malicious behavior is known as network traffic analysis. exponentially. A particularly common method for fighting the
Conventional techniques for analyzing network traffic frequently information detection system is the categorization of internet
rely on signature-based detection, which searches for patterns of traffic. Nour Alqudah et al [2] , presented that traffic analysis
malicious traffic that have been identified. This strategy may not serves a variety of functions, including assessing the efficiency
work against emerging or novel dangers. Deep learning and and security of network administration and operations. It is
believed that NTA is essential for enhancing the security and network traffic identification. DPI technology can also identify
functionality of networks. New approaches are needed to specific application traffic, which increases the accuracy of
identify intrusions, categorize Internet traffic, and analyze virus identification. The use of the ML method based on statistical
behavior as a result of growing network traffic and the flow characteristics helps identify network flows with
development of artificial intelligence. A, Jamuna et al [3] , encryption and unknown features, making up for the
presented the traffic classification based on applications creation shortcomings of DPI technology. Y. Xue et al [12] presented
is a key component of network security and management. that prior to discussing some suggestions that can improve traffic
Traditional methods include payload and port-based deep performance, first try to analyze the present traffic classification
examination approaches. Several privacy concerns, dynamic methodologies in this research, concentrating on their issues and
ports, and encrypted programs make the conventional challenges. A. Boukhalfa et al [13] presented that the detection
procedures ineffective in the current network context. T. Bujlow of threats is complicated by the rising volume and diversity of
et al [4] presented that monitoring the network performance in data in these interactions. In order to identify new invisible
high-speed internet infrastructure is a difficult undertaking dangers, this study employs a novel approach to NTA that
because the standards for the specified quality level depend on leverages big data frameworks to gather substantial volumes of
the type of service. Understanding the different sorts of network traffic data and Deep Learning (DL) algorithms to
applications constituting the present network traffic is analyze it. K. Limthong et al [14] conducted studies to look at
consequently necessary for backbone QoS monitoring and the relationship between interval based characteristics of
analysis in multihop networks. It was suggested to use the C5.0 network traffic and various sorts of network abnormalities using
ML Algorithm to improve traffic classification in order to two well-known ML techniques, naive Bayes and k-nearest
address the shortcomings of the current approaches. Amin neighbor. two ML algorithms, five different kinds of test bed
Shahraki et al [5] presented that for a variety of objectives, anomalies, and nine interval-based features of network traffic to
including network resource management and cyber-security analyze and assess each feature’s correctness. Although only
analysis, this data analysis is crucial. Analytical techniques that from the naive Bayes and k- nearest neighbor algorithms, the
can process network data online based on the receipt of new preliminary results showed the more useful features for each of
data. Data analytics are predicted to be supported by online ML the anomaly types. M. Shafiq et al [15] presented that Internet
(OL) approaches. A. Priya et al [6] presented that numerous traffic can be categorized using a variety of traditional
additional techniques were suggested to address the current techniques, including ML, payload analysis, and port-based
technical problems. The primary objective of network traffic classification. ML (ML) approach is currently the most often
classification is network performance improvement. The utilized method, which has received very accurate results and is
suggested framework states that this work uses unsupervised used by many researchers. The limitations of the several network
ML to identify the user’s browser and application from real-time traffic classification techniques (port-based, payload-based, and
network information. V. A. Muliukha et al [7] presented that ML-based) are presented. The network traffic classification
the application of map-reduce technology to NTA. It describes model is then organized from traffic capture to final output. Used
how to determine when a frame in a PCAP file begins. The Wire Shark tool to record five WWW, DNS, FTP, P3P, and
procedure for producing training samples for various traffic TELNET applications traffic for a length of one minute and the
types is explained. There is a description of the programme Netmate programme to extract 23 attributes for comparative
“Tractor” used in the experiments as well as the findings of study of four algorithms. M. Ramires et al [16] presented that
experimental investigations on the analysis of encrypted traffic the goal of traffic classification methods is to automatically
using the random forest method and the naïve Bayesian analyze and classify traffic moving over a network based on its
classifier. E. Nazarenko et al [8] presented that the theories unique properties and intrinsic nature. Taking advantage of
underlying the operation of the traditional ML algorithms and factors like port numbers and payload analysis for
the knowledge of their parameters were regarded as a component categorization, However, due to the Internet’s rapid growth and
of their theoretical makeup. The created network traffic evolution, such techniques have proven ineffective. a
categorization and identification system makes use of the k-NN, comparative analysis of ML approaches for categorization of
Naive Bayes, and SVM ML algorithms. The theoretical network traffic. D. Szostak et al [17] presented that to provide
underpinnings of traditional ML algorithms—their operating services to users, network operators must prioritize efficient
principles and parameter knowledge—were considered. E. Osa resource allocation. Its optimization provides for cost savings,
et al [9] presented that Network monitoring and unauthorized required service quality, and anomalies in data flow detection.
access or malicious traffic over secured networks are detected Linear Discriminant Analysis (LDA) classifier-based ML (ML)
using intrusion detection systems. The comparative examination method for estimating short-term traffic numbers was offered.
of a few ML techniques presented. ML methods in identifying The traffic prediction problem is therefore formulated as a
anomalies in network data is compared in this research. Decision classification task. M. Soykan et al [18] presented that A product
Tree was judged to provide the overall best outcome. L. that enables anonymous Internet communication without
Trajković et al [10] presented that Traffic traces gathered from disclosing users’; identity is the Tor project. first conducted data
operational communication that characterizes quantify traffic analysis and learned more about the data set. categorical values
loads, examine user behavior patterns, model network traffic, were allocated to numerical values. Iraj Lohrasbinasab et al [19]
and forecast network traffic in the forthcoming. To characterize presented to address problems, network performance is tracked
and model network traffic, examine Internet topologies, and using Network Traffic Monitoring and Analysis (NTMA), a
categorize network anomalies, use traffic traces gathered from collection of approaches. Forecasting network load and its future
various deployed networks and the Internet. Yang et al [11] behavior is the main focus of the significant NTMA ‘Network
presented that the foundation for network traffic monitoring, Traffic Prediction’ (NTP). Typically, one of two methods—
data analysis, and user service quality improvement is accurate statistical or ML—can be used to deploy NTP techniques.
Kuldeep & Agrawal et al [20] presented that Traditional IP neighbors. Naive Bayes is a probabilistic classification technique
traffic classification techniques like port number and payload- built on the Bayes theorem. NTA can be done using neural
based direct packet inspection techniques are no longer widely networks, such as convolutional neural networks (CNNs).
used due to the use of dynamic port numbers in packet headers Although they need a lot of labeled data and computer power to
rather than well-known port numbers and various cryptographic train for complex data patterns. Gradient Boosting - Gradient
techniques that prevent inspection of packet payloads. A boosting is another ensemble technique that sequentially
contemporary trend is to categorize IP traffic using ML (ML) constructs a number of weak learners (typically decision trees),
techniques. A packet capture used to build a real-time internet each of which attempts to correct the mistakes of the previous
traffic dataset during a 2 second packet collecting period. ones. It is renowned for having a high degree of predictability.
These well-liked gradient boosting algorithm implementations,
III. AI AND DEEP LEARNING METHODS IN NETWORK TRAFFIC XGBoost and LightGBM, have been scaled up and improved for
ANALYSIS performance. Since they are so effective, they are frequently used
in many classification tasks. AdaBoost is an ensemble learning
A. Network Traffic Analysis - To detect and counteract security technique that concentrates on hard-to-classify instances. In the
threats, improve network performance, and troubleshoot network subsequent rounds, the weights are iteratively adjusted to give
issues, NTA involves gathering, monitoring, and analyzing data more weight to instances that were incorrectly classified. It
on network traffic. The application of AI and DL in NTA is assigns weights to the training instances. Unsupervised Learning
growing as a means of automating processes, enhancing - Unsupervised classification, also known as clustering, is a ML
accuracy, and generating additional insights. Network traffic, approach where the goal is to group similar instances together
including web traffic, email traffic, and file transfer traffic, can without any predefined labels. In the context of NTA,
be categorized using AI. Possible security risks like malware or unsupervised classification methods aim to discover underlying
distributed denial-of-service (DDoS) attacks can be identified. patterns and structures in the data without using labeled
The performance and capacity planning of network may be examples. common unsupervised classification methods used by
improved as a result by figuring out the source of the issue, AI NTA.
can be utilized to diagnose network issues. Examples of specific Hierarchical Clustering: Hierarchical clustering creates a tree-
applications of AI and DL in NTA include: Stealthwatch is an like structure of nested clusters, also known as a dendrogram. It
NTA solution that Cisco has created that uses AI. Stealthwatch can be agglomerative (bottom-up) or divisive (top-down). At
employs ML to recognize and categorize network traffic, find each step, the algorithm merges or splits clusters based on their
abnormalities and traffic patterns. A NTA solution powered by similarity.
AI named Wildfire was created by Palo Alto Networks. WildFire DBSCAN : density- based clustering algorithm that groups point
employs ML to recognize and categorize malware, recognize together based on their density in the feature space. It can
malicious traffic, and stop assaults. Contrail Analytics is an NTA automatically identify outliers as noise points.
solution developed using AI. ML is used by Contrail Analytics Gaussian Mixture Models (GMM): It estimates the parameters of
to recognize and categorize network traffic, find abnormalities, distributions and assigns each point to the most probable cluster.
and improve network performance. Self-Organizing Maps (SOM): Type of ANN that projects high-
B. Classification methods dimensional data onto a lower-dimensional grid. It organizes the
Supervised Learning - For NTA utilizing ML, supervised data in a topological manner, grouping similar instances closer
categorization techniques are frequently utilized. The model to each other on the grid.
gains knowledge from labeled data to forecast instances of fresh, Agglomerative Information Bottleneck (AIB): AIB is a clustering
unforeseen network traffic. The following are a few well-liked algorithm that finds the optimal trade-off between clustering and
supervised classification techniques for NTA. preserving information, useful for analyzing high-dimensional
Decision Trees - For categorization jobs, decision trees offer a data like network traffic.
straightforward but efficient approach. To make decisions, they OPTICS (Ordering Points to Identify the Clustering Structure):
divided the data into subgroups based on several attributes and OPTICS is an extension of DBSCAN that creates a reachability
built a tree like structure. The edges of the tree represent plot, which allows for flexible clustering based on different
decisions based on the features represented by the nodes, which density levels.
make up the tree. Random Forest: An ensemble method that BIRCH (Balanced Iterative Reducing and Clustering using
mixes various decision trees is known as a random forest. To Hierarchies): BIRCH is a hierarchical clustering method that
produce more precise predictions, it builds multiple decision constructs a tree-like structure to efficiently cluster large
trees during training and aggregates their results. Random forests datasets. Unsupervised classification methods can be valuable
increase the overall robustness and less overfitting. for NTA, especially when there is little or no labeled data
SVM - SVM is an effective binary classification system that available or when exploring the data for potential patterns and
handles both linear and non-linear data with ease. It seeks to anomalies. Additionally, domain expertise is often required to
identify the hyperplane in the feature space that best divides the interpret the clustering results effectively and derive meaningful
classes. One-vs-all or one-vs-one approaches, for example, can insights from the discovered patterns.
be used to extend SVM to handle multi-class classification. It C. Online learning and offline learning
uses a logistic regression model to estimate the likelihood of a A ML model is trained online, with incremental updates made as
particular class occurring depending on features k-Nearest new data comes in. As a result, the model is always evolving and
Neighbors (k-NN) is a straightforward and understandable modifying to account for new data. When doing real-time data
classification algorithm. When a new instance is encountered, streaming activities like fraud detection or spam filtering, online
the system examines the ‘k’ nearest labeled examples and assigns learning is frequently used. One way to train a ML model is
the class that shows up the most frequently among the ‘k’ through offline learning, which involves training the model on
static data. As a result, the model does not change as new data V. POPULAR ML AND DL ALGORITHMS AND FRAMEWORKS
becomes available. In tasks like image classification or natural FOR NTA
language processing, where the data is not streaming in real time, A.ML algorithms
offline learning is frequently used. SVMs: SVMs are used for NTA because they learn complex
relationships between the features of network traffic.
IV. NETWORK TRAFFIC ANALYSIS TOOLS
Decision trees: Decision trees are used for NTA because they
are easy to interpret and to identify important features of
NTA tools are software applications or platforms designed to
monitor, capture, analyze, and visualize network traffic. some network traffic.
popular NTA tools are. B.DL algorithms
Wireshark:Wireshark is a widely used open-source packet CNNs: CNNs are a type of DL algorithm that is well-suited for
analyzer that allows users to capture and inspect network tasks that involve processing images or sequences of data.
packets in real-time. CNNs have been used for NTA to identify malicious traffic,
tcpdump: tcpdump is a command-line packet capture tool such as malware or botnets.
available on Unix-like operating systems. It can capture and Long short-term memory networks (LSTMs): LSTMs are a type
display network traffic in real-time or save it to a file for later of DL algorithm that is well-suited for tasks that involve
analysis. processing sequential data. LSTMs have been used for NTA to
Ethereal: Ethereal, now known as Wireshark, is a packet identify patterns in network traffic that are indicative of
analyzer that runs on multiple platforms and supports a broad malicious activity.
range of protocols. Generative adversarial networks (GANs): GANs are a type of
tshark: tshark is the command-line version of Wireshark, DL algorithm used to generate synthetic data that is similar to
allowing users to perform packet analysis without a graphical real data. GANs have been used for NTA to generate synthetic
interface. network traffic used to train ML models.
NetFlow Analyzer: NetFlow Analyzer is a tool that collects and C. Frameworks
analyzes NetFlow data from network devices to provide TensorFlow: TensorFlow is a popular open-source framework
insights into network traffic patterns and bandwidth usage. for ML and DL. TensorFlow is well-suited for NTA because it
PRTG Network Monitor: It monitors bandwidth usage, traffic is able to process large amounts of data quickly and efficiently.
patterns, and provides detailed reports. PyTorch: PyTorch is another popular open-source framework
Ntopng: Ntopng is a high-performance NTA tool that provides for ML and DL. PyTorch is similar to TensorFlow, but it is
real-time and historical traffic data, including top talkers, more flexible and easier to use.
protocols, and applications. Scikit-learn: Scikit-learn is a popular Python library for ML.
SolarWinds Network Performance Monitor: This tool offers Scikit-learn includes a number of algorithms used for NTA,
NTA, monitoring, and alerting features to help administrators such as SVMs, decision trees, and random forests. The specific
identify and resolve network issues. algorithms and frameworks that are used will depend on the
Nagios: Nagios is a popular open-source network monitoring specific task and application.
and alerting system that can be extended with plugins to
perform traffic analysis.
VI. IMPLEMENTATION OF REAL TIME NETWORK TRAFFIC
Capsa Network Analyzer: Capsa is a user-friendly network ANALYSIS
analyzer that offers real-time and historical NTA, protocol
analysis, and bandwidth monitoring. Real-time NTA refers to the process of monitoring and
Splunk: Splunk is a versatile platform used for log aggregation, analyzing network traffic as it occurs, providing immediate
analysis, and visualization, making it suitable for NTA when insights into the current state of the network. This type of
combined with appropriate plugins. analysis is crucial for detecting and responding to network
Observer (formerly Network Instruments Observer): Observer anomalies, security threats, performance issues, and other
is a comprehensive network monitoring and analysis platform events that require timely action. Real-time NTA typically
that provides in-depth traffic analysis capabilities. involves the following steps:
FlowTraq: FlowTraq is a network security and traffic analysis Packet Capture: To perform real-time analysis, network
tool that focuses on NetFlow and IPFIX data. packets need to be captured as they traverse the network. This
Graylog: Graylog is a log management and analysis platform can be achieved using packet capture tools like Wireshark or
that can be utilized for NTA when network logs are available. tcpdump, or by leveraging network devices that support packet
These tools offer various features and capabilities, so the capture functionalities.
choice of the right tool depends on the specific requirements Packet Filtering and Processing: As packets are captured, they
and objectives of the network analysis tasks. Some tools are need to be filtered and processed to extract relevant
more suitable for real-time monitoring and alerting, while information and features. This step may involve filtering out
others focus on in-depth protocol analysis and historical data irrelevant traffic, dissecting protocol headers, extracting
examination. payload data, and generating flow data for efficient analysis.
Feature Extraction: Important features are extracted from the
network traffic data. These features may include source and
destination IP addresses, ports, protocols, packet sizes, payload
types, packet timestamps, and more. These features are crucial Random forest model is implemented. And obtained an
for building models and making real-time decisions. accuracy of 99.31%.
Real-Time Analysis: These analyses can include identifying
abnormal patterns, detecting security threats like intrusion
attempts, predicting network performance issues, or classifying
traffic based on application or behavior.
Alerting and Visualization: If the analysis reveals any Fig.1 Code snippet for confusion matrix
significant events or anomalies, real-time alerts are triggered to
notify network administrators or security teams. Visualization
tools are also used to provide real-time graphical
representations of network traffic, making it easier to interpret
and respond to the data. Fig.2 Code snippet for training and testing of random forest model.
Continuous Monitoring: Real-time NTA is an ongoing process
that requires continuous monitoring of the network. The
captured data is constantly updated, and the analysis is
performed in real-time on the latest information. Fig.3.Code snippet for data normalization using scalar
Response and Mitigation: Based on the analysis and alerts,
network administrators can take immediate action to mitigate
threats, resolve performance issues, or investigate suspicious
activities in real-time.
Real-time NTA is essential for ensuring network security,
optimizing network performance, and maintaining the overall
health of the network. It allows organizations to respond
quickly to incidents and make informed decisions based on up-
to-date network information.
Software Requirements - Python 3.11.4 was used as the high- Fig.4.Data split about features
level language to create the system’s software. The Jupyter
Figures 1 to 3 show the code snippet of training and testing of
notebook environment was used for the development process.
random forest model to generate confusion matrix. Figure 4
The model was developed using the Python language’s pandas,
shows the details of data split features. It is a useful tool for
scikit-learn, seaborn, matplotlib, and Numpy libraries.
understanding how well the algorithm is able to distinguish
• tcp/udp/icmp length- length after fragmentation between different classes. Table 1 shows the confusion matrix.
• protocol- if the protocol is used or not.
• label-if the label is used or not (virtual circuit approach TABLE 1. CONFUSION MATRIX OF TRAINING AND TESTING
or datagram approach)
Dataset specifications and features - The properties of network Training accuracy = 99.29 %
traffic utilized to distinguish and assess network activities are Testing accuracy = 99.31%
referred to as network traffic features. The following uses for The accuracy of a model, which considers both precision and
these features are just a few examples: Security of the network: recall. It is derived as the harmonic mean of recall and
Features of network traffic can be exploited to spot malicious precision, giving both measures equal weight. In a multi-class
traffic, such as malware or botnets. Network troubleshooting: classification task, averaging measurements over several
Issues with the network, like packet loss or latency, can be classes is done using the macro average technique. It is an
resolved by using the features of the network traffic. The approach to combine the outcomes of a classification model
network traffic features that are used are. across various classes without assigning any one class more
• SERVICE-the service value for its packets to a high importance. An average that considers the relative significance
value to ensure that the packets are routed quickly and of various values in a data collection is called a weighted
reliably higher the no more the reliability. average. Each value is multiplied by a weight in a weighted
• Sport- senders port address at application layer average, and the total of the weighted values is then divided by
• Dport- destination port address at application layer the total of the unweighted values.
• ttl-time to live
VIII. DESIGN AND IMPLEMENTATION CHALLENGES
• ip_length- original length of the ipv4 packet
• ip_checksum- checksum value for error detection Data volume and complexity-NTA typically involves
• ip_id-IP-used to assist packet segmentation and collecting and analyzing large amounts of network traffic data.
reassembly. This data can be very complex, as it can include a variety of
• ip_offset- data in the prior bits divided by 8. different protocols, packet types, and encryption levels.
False positives and negatives-NTA systems can generate false minimizing downtime caused by network issues. Optimized
positives and negatives. False positives occur when the system Performance-through predictive analysis and traffic
incorrectly identifies benign traffic as malicious. False engineering, ML can help optimize network performance,
negatives occur when the system incorrectly identifies capacity planning, and QoS management, ensuring a seamless
malicious traffic as benign. user experience. Automated Network Management- ML
Scalability-NTA systems scale to handle large amounts of models integrated into network management systems can
network traffic. This can be a challenge, as NTA systems automate decision-making processes, load balancing, and
typically need to be able to process data in real time. resource allocation, reducing manual intervention and
Cost - NTA systems can be expensive to implement and improving network efficiency. Insights into User Behavior-
maintain and collect and analyze large amounts of network ML analysis provides valuable insights into user behavior,
traffic data. allowing organizations to detect potential insider threats or
A. Applications suspicious activities. Efficient Traffic Classification-ML can
Network security: NTA used to mitigate security threats, such accurately classify network traffic into different application
as malware, botnets, and denial-of-service attacks. types, aiding in bandwidth management and prioritizing
Network performance: NTA used for performance bottlenecks critical applications. Network Forensics and Predictive
and optimize network performance. Maintenance-ML assists in network forensics, enabling the
Network troubleshooting: NTA used to troubleshoot network investigation of past incidents and predicts equipment failures
problems, such as packet loss or latency. to support proactive maintenance.
Compliance: NTA used to help organizations comply with Despite the numerous benefits, NTA using ML also poses
regulations, such as those related to data privacy and security. some challenges. Properly handling and preprocessing large-
Forensics: To investigate security incidents and to gather scale network data, selecting appropriate features, dealing with
evidence for legal proceedings. imbalanced datasets, and avoiding overfitting are some of the
Business intelligence: NTA used to gather insights into user challenges that need to be addressed. To leverage the full
behavior and to improve business decision-making. potential of NTA using ML, continuous research and
B. Future research directions development in the field of artificial intelligence, data science,
NTA is a rapidly evolving area, and there are a number of open and cybersecurity are vital. Additionally, collaboration
issues and future research directions. Some of these issues between domain experts and data scientists is essential to
include: design effective models tailored to specific network
Heterogeneity of network traffic - number of devices and environments and use cases. As technology advances and new
applications connected to the internet increasing, the ML techniques emerge, NTA will continue to evolve, playing
heterogeneity of network traffic is increasing. This makes it a pivotal role in securing networks, improving performance,
more difficult to develop NTA systems that can accurately and enabling more efficient network management for
identify and classify all types of network traffic. organizations across various industries.
Dynamic nature of network traffic - Network traffic is
constantly changing, as new applications and protocols are REFERENCES
developed and as users' behavior changes. This dynamic nature
[1] Neeraj Namdev et al “Recent Advancement in ML Based Internet Traffic
makes it difficult to design NTA systems that can keep up with Classification”, Procedia Computer Science,Volume 60, 2015, pp.784-
the latest changes in network traffic. 791.
Privacy concerns - NTA systems collect and analyze large [2] Nour Alqudah et al “ML for Traffic Analysis: A Review”, Procedia
amounts of network traffic data, which can raise privacy Computer Science,Volume 170, 2020, pp.911-916.
concerns for users. [3] A, Jamuna; S.E, Vinoth Ewards. “Survey of Traffic Classification using
Cost - NTA systems can be expensive to implement and ML” International Journal of Advanced Research in Computer Science,
pp.65-71, 2017.
maintain.
[4] T. Bujlow et al “A method for classification of network traffic based on
C5.0 ML Algorithm,” International Conference on Computing,
IX. CONCLUSION Networking and Communications (ICNC), 2012, pp. 237-241.
[5] Amin Shahraki et al “A comparative study on online ML techniques for
NTA using ML is a powerful and essential approach for network traffic streams analysis”,Computer Networks,Vol. 207, 2022,
gaining valuable insights, enhancing security, optimizing pp.1389-1286.
performance, and automating network management. By [6] A. Priya et al “An Analysis of real-time network traffic for identification
harnessing the capabilities of ML algorithms, organizations of browser and application of user using clustering
algorithm,”International Conference on Advances in Computing,
can effectively process and interpret the vast amount of Communication Control and Networking (ICACCCN), 2018, pp. 441-
network traffic data generated every day. This data-driven 445.
approach enables us to make informed decisions, respond [7] V. A. Muliukha et al “Analysis and Classification of Encrypted Network
rapidly to security threats, and improve the overall efficiency Traffic Using ML,” International Conference on Soft Computing and
Measurements (SCM), 2020, pp. 194-197.
of their networks. real time NTA implemented using random
[8] E. Nazarenko et al “Application for Traffic Classification Using ML
forest algorithm and obtained an accuracy of 99.31%. Some Algorithms,” International Conference Quality Management, Transport
key takeaways from NTA using ML are Enhanced Security- and Information Security, Information Technologies (IT&QM&IS),
ML models can detect and prevent various security threats, 2020, pp. 269-273.
including malware, intrusions, and denial-of-service attacks, [9] E. Osa et al “Comparative Analysis of ML Models in Computer Network
by analyzing network traffic patterns and identifying Intrusion Detection,” International Conference on Disruptive
Technologies for Sustainable Development (NIGERCON), 2022, pp. 1-5.
suspicious activities in real-time. Anomaly Detection-ML
algorithms can identify abnormal network behavior and
performance deviations, enabling prompt action and