0% found this document useful (0 votes)
17 views9 pages

Android Malware Detection With Different IP Coding Methods

Uploaded by

Thoughts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Android Malware Detection With Different IP Coding Methods

Uploaded by

Thoughts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

NETWORK SECURITY AND CRYPTOGRAPHY

(NSC)

Lab Report

Android Malware Detection with IP Coding Methods

Submitted By:
Eman Tariq
20-CP-12
Faria Raghib
20-CP-56

Submitted to:
Dr.Asim Raheel

Dated:
23/05/2024
Android Malware Detection with IP Coding Methods

Introduction:

The growing influence of telecommunication networks and the metaphor of the internet have
revolutionized the way organizations carry out their activities. Indeed, the spectacular evolution of
technology, digitalization, cloud/fog/edge computing, quantum computing, and the deployment of
an exorbitant number of connected objects have given rise to unprecedented cybercriminal
activities. Cybercriminals, individuals or groups with malicious intent, exploit vulnerabilities in
digital systems for financial gain, espionage, disruption, and political motivations.The current
landscape of cyber threats is dynamic and multifaceted, with cybercriminals continuously adapting
their tactics and techniques to exploit emerging vulnerabilities and circumvent traditional security
measures. One of the most prevalent forms of cybercrime is the proliferation of malware, malicious
software designed to infiltrate computer systems and compromise their integrity or steal sensitive
information. Malware comes in various forms, including viruses, worms, trojans, ransomware,
spyware, and adware, each posing unique threats to individuals, businesses, and governments
worldwide.

Ransomware attacks, in particular, have become increasingly prevalent and damaging in recent
years. These attacks involve cybercriminals encrypting victims' data and demanding ransom
payments in exchange for decryption keys. High-profile ransomware incidents have targeted
critical infrastructure, healthcare systems, financial institutions, and government agencies, causing
widespread disruption, financial losses, and reputational damage. The evolution of ransomware-
as-a-service (RaaS) platforms has democratized cybercrime, enabling less technically skilled
individuals to carry out sophisticated attacks with minimal effort.Supply chain attacks represent
another significant cyber threat, where cybercriminals exploit vulnerabilities in third-party vendors
and service providers to gain unauthorized access to their customers' networks. By compromising
trusted entities within the supply chain, cybercriminals can infiltrate target organizations, exfiltrate
sensitive data, and deploy malware payloads, often with devastating consequences.

The proliferation of cloud computing and edge computing technologies has expanded the attack
surface for cybercriminals, presenting new challenges for cybersecurity professionals.
Misconfigured cloud instances, insecure APIs, and data breaches resulting from unauthorized
access to cloud storage repositories are just a few examples of the security risks associated with
cloud computing environments. Similarly, the rapid adoption of IoT devices has introduced new
vulnerabilities into digital ecosystems, with many IoT devices lacking adequate security controls
and protocols.In response to these evolving cyber threats, organizations must prioritize
cybersecurity and implement robust security measures to protect their digital assets and
infrastructure. This includes regular security assessments, employee training programs, incident
response plans, and the adoption of advanced security technologies such as endpoint detection and
response (EDR), network segmentation, and threat intelligence platforms.

Furthermore, collaboration and information sharing among industry stakeholders, government


agencies, and cybersecurity researchers are essential for detecting and mitigating cyber threats
effectively. By working together to identify emerging threats, share threat intelligence, and
develop proactive cybersecurity strategies, we can collectively enhance our resilience to cyber
attacks and safeguard the integrity and security of digital systems worldwide.

Dataset:

The dataset utilized in this study was sourced from the University of New Brunswick’s Canadian
Institute for Cybersecurity website . The CICAndMal2017 dataset, created by Lashkari et al.,
includes over 10,854 samples (comprising 4,354 malware and 6,500 benign samples) collected
from various sources. Through dynamic analysis conducted on real devices, 426 malware and
5,065 benign samples were obtained. The benign software was gathered from the most popular
free applications available on the Google Play market in 2015, 2016, and 2017. The malware
samples are categorized into four types: adware, ransomware, scareware, and SMS malware, with
each sample labeled accordingly. Figure 2 illustrates the distribution of examples by attack type in
the CICAndMal2017 dataset.

The breakdown of malicious applications is as follows:

Adware Malicious Applications: Includes 104 applications from families such as Ewind,
Dowgin, Gooligan, Feiwo, Shuanet, Kemoge, Youmi, Koodous, Mobidash, and Selfmite.

Ransomware Malicious Applications: Comprises 101 applications from families like Charger,
Pletor, Jisut, PornDroid, Koler, RansomBO, LockerPin, Svpeng, Simplocker, and WannaLocker.
Scareware Malicious Applications: Contains 102 applications from families including
AndroidDefender, FakeApp.AL, AndroidSpy.277, FakeAV, AV, FakeJobOffer, FakeTaoBao,
Penetho, and FakeApp.

SMS Malware Applications: Consists of 99 applications from families such as Bean Bot, Ji Fake,
Bilge, Mazarbot, FakeInst, Nandrobox, FakeMart, Plankton, FakeNotify, and SMS Sniffer.

Benign Applications: Includes 1,700 benign applications sourced from the Google Play market
in 2015-2016.

The CICAndMal2017 dataset encompasses 84 features along with a label (Attack, Normal). This
study aims to analyze the impact of IP Addresses. To achieve this, 375,564 data points from the
adware category and 410,548 data points from the benign category in the CICAndMal2017 dataset
were combined, resulting in a comprehensive dataset of 786,112 data points.

Implementation:

In this implementation, we performed a comprehensive preprocessing and classification of the


CICAndMal2017 dataset using a Random Forest model. Initially, data from multiple CSV files
located within specific directories for adware and benign applications were loaded and
concatenated into single DataFrames using a custom function. Labels were added to differentiate
between adware (1) and benign (0) samples. The combined dataset was then subjected to
preprocessing steps, including the conversion of 'Timestamp' columns to integer format and the
splitting of IP addresses into four separate integer features, followed by the removal of the original
IP address columns.To ensure the data's integrity, the 'Flow ID' column was dropped, and all
remaining features were converted to numeric types with missing values filled with zero. Negative
and infinite values were handled by replacing them with zero and clipping extreme values,
respectively. Feature selection was performed using the SelectKBest method with the chi-squared
test, narrowing down to the top 50 features.

The preprocessed dataset was then split into training and testing sets. A Random Forest classifier
was trained on the training set and evaluated on the testing set. The model's performance was
measured using accuracy and classification report metrics, demonstrating the efficacy of the
preprocessing and feature selection steps. This comprehensive approach ensured robust handling
of the dataset as shown in table 1, preparing it for effective machine learning model training and
evaluation.

Table:

Feature Feature Feature


Flow ID Fwd IAT Min Avg Bwd Segment Size
Source IP Bwd IAT Total Fwd Header Length
Source Port Bwd IAT Mean Fwd Avg Bytes/Bulk
Destination IP Bwd IAT Std Fwd Avg Packets/Bulk
Destination Port Bwd IAT Max Fwd Avg Bulk Rate
Protocol Bwd IAT Min Bwd Avg Bytes/Bulk
Timestamp Fwd PSH Flags Bwd Avg Packets/Bulk
Flow Duration Bwd PSH Flags Bwd Avg Bulk Rate
Total Fwd Packets Fwd URG Flags Subflow Fwd Packets
Total Backward Packets Bwd URG Flags Subflow Fwd Bytes
Total Length of Fwd Packets Fwd Header Length Subflow Bwd Packets
Total Length of Bwd Packets Bwd Header Length Subflow Bwd Bytes
Fwd Packet Length Max Fwd Packets/s Init_Win_bytes_forward
Fwd Packet Length Min Bwd Packets/s Init_Win_bytes_backward
Fwd Packet Length Mean Min Packet Length act_data_pkt_fwd
Fwd Packet Length Std Max Packet Length min_seg_size_forward
Bwd Packet Length Max Packet Length Mean Active Mean
Bwd Packet Length Min Packet Length Std Active Std
Bwd Packet Length Mean Packet Length Variance Active Max
Bwd Packet Length Std FIN Flag Count Active Min
Flow Bytes/s SYN Flag Count Idle Mean
Flow Packets/s RST Flag Count Idle Std
Flow IAT Mean PSH Flag Count Idle Max
Flow IAT Std ACK Flag Count Idle Min
Flow IAT Max URG Flag Count
Flow IAT Min CWE Flag Count
Fwd IAT Total ECE Flag Count
Fwd IAT Mean Down/Up Ratio
Fwd IAT Std Average Packet Size
Fwd IAT Max Avg Fwd Segment Size
Table 1 CICAndMal2017 Dataset Feature

Results:

After preprocessing and classifying the CICAndMal2017 dataset using a Random Forest model,
the results were evaluated in terms of accuracy and other classification metrics. The dataset,
consisting of features from both adware and benign applications, was split into training and testing
sets. The Random Forest classifier achieved some accuracy, reflecting the model's capability to
distinguish between adware and benign samples effectively. The classification report provided
detailed metrics such as precision, recall, and F1-score for each class (adware and benign),
indicating the robustness and reliability of the model in identifying different types of
applications.These results underscore the preprocessing steps, including the handling of IP
addresses, timestamps, and feature selection. The Random Forest model demonstrated strong
performance, suggesting that the feature engineering and selection processes significantly
contributed to the model's accuracy and overall classification success. The exact numerical results,
such as the specific accuracy score and detailed classification metrics, were obtained from the
classification report, confirming the model's suitability for this binary classification task.

Figure 1 Features Finalized


Figure 2 Accuray

The above Figure 2 shows the accuracy score we achieved using the random forest classifier
model.Furthermore in order to evaluate its metrics and parameters we use the following confusin
matrix plots and graphical representation to showcase how well our model has performed on the
given dataset.

Confusion Matrix:

Figure 3 Confusion Matrix Plot


The confusion matrix shows how well the model performs on the given dataset and how its value
should be normalized.

Graphical Plots:

Figure 4 F1 Score,Precision and Recall Parameters

The above figure 4 shows graphical plot representation shows the F1-score,Precision and Recall
parameters.
Reference:

1. Bayazit, E. C., Sahingoz, O. K., & Dogan, B. (2021, June). Neural network based Android malware
detection with different IP coding methods. In 2021 3rd International Congress on Human-Computer
Interaction, Optimization and Robotic Applications (HORA) (pp. 1-6). IEEE.
2. Noorbehbahani, F., & Saberi, M. (2020, October). Ransomware detection with semi-supervised
learning. In 2020 10th International Conference on Computer and Knowledge Engineering
(ICCKE) (pp. 024-029). IEEE.
3. Chen, R., Li, Y., & Fang, W. (2019, July). Android malware identification based on traffic analysis.
In International conference on artificial intelligence and security (pp. 293-303). Cham: Springer
International Publishing.
4. Bayazit, E. C., Sahingoz, O. K., & Dogan, B. (2022, June). A deep learning based android malware
detection system with static analysis. In 2022 International Congress on Human-Computer Interaction,
Optimization and Robotic Applications (HORA) (pp. 1-6). IEEE.
5. Arslan, R. S. (2021, October). Identify type of android malware with machine learning based ensemble
model. In 2021 5th international symposium on multidisciplinary studies and innovative technologies
(ISMSIT) (pp. 628-632). IEEE.

You might also like