AI-Driven Proactive Cloud Application Data Access Security
AI-Driven Proactive Cloud Application Data Access Security
This paper aims to develop a novel system that with GDPR requirements, and it took several months for
leverages human and artificial intelligence for data Marriott to discover the data breach.
aggregation, correlation, context and intent derivation, and
consolidation of point solutions to effectively address the In the Twitter Bitcoin Scam of July 2020, hackers
core purpose of cloud application data protection. gained access to administrative tools via compromised
credentials of Twitter employees and posted scam messages
II. LITERATURE AND BACKGROUND SURVEY from 130 private and corporate high-profile Twitter
accounts [11], resulting in the transfer of $180,000 in
Insider threats and attacks are on the rise. Sixty-eight bitcoins to scamming accounts.
percent of organizations have observed an increase in
insider threats over the past 12 months [1], and forty-nine The Zola Hack in May 2022 involved hackers using an
percent of organizations can’t detect these threats. Even if age-old attack technique known as credential stuffing to
detected, it’s often difficult, if not impossible, to prove due breach the popular wedding planning site, Zola [12]. This
to a lack of proper forensics [2]. Compromised cloud resulted in fraudulent activity tied to customer accounts
accounts cost companies an average of $6.2 million each [13], with approximately 3,000 accounts compromised. As
year and lead to 138 hours of application downtime [3]. part of their remediation efforts, Zola temporarily disabled
Notably, sixty-two percent of data breaches are attributed to mobile apps connected to the platform, causing business
leveraged credentials, according to the Verizon Data Breach slowdowns and requiring urgent remediation efforts.
Investigation Report [4]. It takes an average of 287 days for
an organization to identify a data breach, with the average III. EXISTING SYSTEM AND UNIQUENESS OF
cost amounting to $3.86 million [5]. Types of users that THE PROPOSED SYSTEM
pose a risk include disgruntled employees, outgoing
employees, accidental exposures, corporate spies, and Existing User Entity Behavioral Analytics (UEBA)
fraudsters. Solutions in the Market and Academic Research [14]
Lack in Four Critical High-Level Areas:
My research on the OpenDaylight Software Defined
Network (SDN) controller, conducted in 2016 [6], aimed to It’s always hard to create a baseline due to data
improve the scalability of the architecture through corruption and lack of user data and its aggregation.
microservices running on different instances of the Even when this challenge is addressed, there are the next
controller. This research provided valuable insights for the layer of challenges. The data feed is not rich enough to
current study. Based on the findings in SDN microservice correlate the user activity across multiple applications
scaling architecture, we could employ a user activity and multiple instances of the same application. This is
microservice for a specific user across cloud application where my previous SDN research [6] helps our current
instances. This approach allows us to holistically gather user research to consolidate the data by having a specific user
activities across different applications, providing a data service run in different instances across
sophisticated user data feed and an intelligent analytic applications. Artificial intelligence (AI) technologies like
analyzer that scales both horizontally and vertically. The natural language processing and generative AI models
novel software microservice-based architecture mentioned are used to quickly connect and grasp the intent and
in the SDN research [6] significantly influenced the context of the user holistically.
architecture of the proposed cloud AI-driven data security The analytics engine is not sophisticated enough to
system. accurately compare the user activity among themselves
and across their peers at the same given time stamp.
Several real-world case studies prompted the need for Here is where our proposed system is unique. It uses
this research, leading to the development of this paper. machine learning (ML) models to train the user data in
relation to peer data to establish a baseline, along with
In the General Electric’s Malicious Insider Case of attaching a baseline deviation score to each user action
2020, a couple of GE employees were convicted of stealing for predictions. The covalent machine learning model,
trade secrets to gain a business advantage [7]. Thousands of which has the user and peer score attached, encompasses
files were downloaded by the employees before leaving the multiple inner models like probabilistic models and
organization, without the knowledge of the GE location models that vertically scale on-demand to
cybersecurity team. It took GE several years to discover and effectively analyze and assign the activity deviation
convict them [8], and even then, proving the case took years scores. Additionally, our proposed solution also
due to a lack of forensics. incorporates feedback from the admin or traditional
policy-based services to train the ML model to
The Marriott Data Breach Case in January 2020 proactively adapt its detections.
involved hackers gaining access to a third-party application Once existing UEBA solutions provide anomalous
through compromised credentials [9], resulting in access to scores, they do not limit and are mostly used for
Marriott guest lists containing sensitive PII information informational visibility purposes. Here, with our
[10]. Marriott was fined £18.4 million for non-compliance proposed system, we provide information for visibility
and simultaneously take real-time action to adjust
privilege permissions for suspicious users and send saving them triaging time. This approach eliminates the
notifications to the admin and suspicious end-users via burden on the admin while allowing them to retain
email and real-time system-generated "on-the-fly Policy" overall approval/disapproval authority.
logged in the existing policy lists. When the notification In a traditional system, there is only one analytic engine
reaches the admin in real-time, they have the option to that is intelligent with a basic ML model or rule-based
override the system-provided "on-the-fly Policy" back to model. This engine forms the core of the UEBA system.
its previous state. This approach allows malicious users However, in our proposed system, the intelligence of the
to be prevented from certain access or activities without detection engine is decentralized, making every service
blocking productivity. In the future, suspicious end-users from the ground up autonomous to enable smart work
and their managers could also be added to the for a fully intelligent system.
notification service, and the manager could be given a
justification option submitted to the admin to allow for Overall, we have established that the proposed solution
one of several scenarios in a decentralized manner, is one-of-a-kind and has been proposed for the first time.
(admins). Risk is calculated based on deviations detected in The User/Entity Risk Model Includes:
a user/entity's normal behavior pattern. An overall risk
score, its corresponding range, and rank are computed for a Risk Range: Critical, High, Medium, Low, Info level
user/entity based on various types of suspicious activity User/Entity Risk Score: (0-100)%
incidents. Risk Rank: user rank/number of employees in the
organization
Peer Risk Score: (0-100)%.
In the fourth stage, the action driver plane is and Box, later uploading them to their personal Google
responsible for taking control actions based on the visibility Drive from a managed device.
and recommendations provided by the analytics engine Admin gets notified of suspicious user activities - By
plane. This is where the response actions, involving write- having both the employee's profile and their group, the
to-left communication in the system, are initiated. For normalcy-deviation detector (without any policies)
instance, once a policy is enforced, the notification of the tracks activities across corporate applications and flags
incident is sent to the suspicious end user, their manager, the user as suspicious, thereby identifying malicious
with the option to provide justification, and the admin to insiders.
approve or revoke the on-the-fly policy in real-time. Multiple devices using the same credentials to access
corporate apps - Several employees from the same
Here are Some of the Key Use Cases the System department and region access Salesforce corporate
Effectively Addresses. applications using the credentials of the same local
account.
Potential Attacker Identification - As a cloud admin, I’d Manager gets notified of compromised account - Having
like to detect and prevent potential attacks to ensure our device-to-user mapping, the normalcy-deviation detector
organization's data in the cloud remains secure. (without any preset policies) tracks Salesforce
User Access Privilege Misuse Identification - As a cloud application access from different devices using the same
admin, I’d like to identify and stop unnecessary local account and flags this as a compromised account.
privileges given to certain users and restore them to their The manager of the group gets notified, necessitating
peer norms. them to deprovision the account to mitigate potential
Malicious insider stealing data from multiple corporate risks.
applications - An employee exfiltrates data by Malicious data accesses go undetected - Employees use
downloading a large number of files from Google Drive their valid credentials to exfiltrate data; ex-employees
still use unprovisioned accounts to gain access; attackers A. Compromised Accounts (Credential Theft):
use valid user accounts to access corporate applications.
Visibility to admin - Risk score-based top risky users - Local Accounts - Several employees from the same
ML-driven analytics engine tracks user activities across department and region access the Salesforce corporate
all SaaS applications. Based on this, every user in the application using the credentials of the same local
organization receives a risk score indicating their risk to account. Leaving loose ends would mean the employee,
the organization. The admin gains visibility on top risky even after leaving the organization, could access that
users, and policy recommendations are provided to local account with the known shared credentials.
monitor the risky users accessing applications, which the Is Uma really Uma? - Uma usually accesses the Box
admin can enforce in a single click. application from San Jose, California, between the hours
Remote employee accesses corporate data - An of 8 am and 6 pm. A hacker, who claims to be Uma, with
employee accesses corporate applications from a new her stolen credentials accesses the same Box application
remote location using their managed device. from a different location (never accessed by Uma from
Instant incident remediation with SOAR - The this location based on her past history) at an unusual
behavioral analytics engine flags the user as suspicious. time.
SaaS security directly integrated with SOAR
automatically executes a playbook to notify both the B. Privilege Misuse (Privileged User Threats):
employee and their manager about this incident. The
manager remediates the incident by providing a response Activity Type - Losh, who is part of the HR
action as a false positive with justification that the user organization, usually can view the salary data of all
has moved to this new remote location. employees. She also has the additional privilege to copy.
Malware attacker goes unrestrained - An employee Losh copies the salary information of select employees
continuously uploads malware files to a corporate in the Sales organization, differing from the behavior of
application. The system immediately blocks the attacker her peers.
and enables the admin for future decisions. The analytics Application Access - Neela from the engineering
engine has already flagged this user as “high” risk. division has access to all corporate applications in the
Additionally, the engine notifies this specific user organization. One fine day, Neela accesses the
uploading malware content to the corporate application, Salesforce application to view the quarterly report,
makes a policy recommendation, and enforces blocking differing from her peers in the engineering department.
the user from further accessing that corporate app. The
admin reviews this enforcement and allows or revokes C. Data Exfiltration (Data Theft):
this policy in a single click. Bulk Activity - Pranesh has tendered his resignation.
Now, Pranesh downloads marketing documents from
VI. IMPLEMENTATION AND RESULTS different corporate applications, which is unusual behavior
compared to his baseline behavior. He intends to use this
The overall system implementation is done using the information in his next job.
Java programming language, and machine learning models,
largelanguage models, and natural language processing are D. Sensitive Data Exfiltration (Data Breaches):
used across the system, working and interacting along with
the databases and the microservices. These models utilize a Extensive Sensitive Data Access -Viji, who is a Bank
range of methodologies starting with probability manager, downloads all customer PII data from
distribution, neural networks, and feedback correlator, and corporate GDrive.
subsequent decorrelator to ensure the models are more Broader Sensitive Data Access - Perumal is an employee
robust and accurate. Following are the key scenarios for in the HR department who, as part of his job duties, had
testing: to access employees' SSN. Suddenly, we see Perumal
accessing the source code and HIPAA content. This is a
deviation from his usual sensitive content access, which
is SSN.
File Count Deviation - Data Exfiltration/Suspicious Data Activity Type, Current File Counts, Expected Value,
Access (Volume Deviation) - App Instance Name, Risk Score, View Log.
File Activity Type Deviation - Privilege Applications Deviation) - App Instance Name, Current
Misuse/Suspicious Activity Type Access and Suspicious Activity Types/Applications Value, Expected Value,
Application Access (Activity Type Deviation, Risk Score.
Fig 9: User and Peer Comparison Graphs for Activity Type Access
Application Access Deviation - Privilege Applications Deviation) - App Instance Name, Current
Misuse/Suspicious Activity Type Access and Suspicious Activity Types/Applications Value, Expected Value,
Application Access (Activity Type Deviation, Risk Score.
Fig 11: User and Peer Comparison Graphs for Application Access
Location and Time Deviation - Compromised Current Value (Time, Location, Device), Expected
Account/Suspicious Logins (Time Deviation, Location Value, Risk Score, View Logs.
Deviation, Device Deviation) - App Instance Name,
Fig 12: User Activity Table for Time and Location Access
Fig 13: User and Peer Comparison Graphs for Time and Location Access
Sensitive Data Access Deviation - Sensitive Data Type, Current Value (Files/Messages/Entity), Expected
Exfiltration/Suspicious Sensitive Content Access (Data Value, Risk Score, View Log (this will include Time,
Patterns and Profiles) - App Instance Name, Activity Location, File/Entity Name, Sensitive Data Content).
Fig 15: User and Peer Comparison Graphs for Sensitive Data Access
In summary, the findings of this research emphasize [1]. Cybersecurity Insiders, “Insider Threat Report
the importance of implementing robust security protocols to [Gurucul],” [Online]. Available at:
protect organizational data within cloud environments. https://www.cybersecurity-insiders.com/wp-
Through an in-depth examination of various threat content/uploads/2021/06/2021-Insider-Threat-
scenarios, we have underscored the critical need for Report-Gurucul-Final-dd8f5a75.pdf. [Accessed:
proactive detection and mitigation strategies to safeguard May-2022].
against potential breaches. [2]. Pulse and Code 42 Survey report. “Pulse Survey:
47% of Organizations Don’t Properly Monitor
Our proposed system, leveraging advanced Insider Risk Indicators,” [Online]. Available at:
technologies including machine learning models, large https://www.code42.com/resources/infographics/puls
language models, and natural language processing, offers a e-survey-forty-seven-percent-of-organizations-dont-
comprehensive approach to identifying and addressing properly-monitor-insider-risk-indicators. [Accessed:
security risks. By harnessing these methodologies, we May-2022].
enhance the accuracy and efficiency of threat detection, [3]. Ponemon LLC Research report, “The Cost of Cloud
enabling organizations to respond swiftly and effectively to Compromise and Shadow IT,” [Online]. Available at:
potential security incidents. https://www.proofpoint.com/sites/default/files/analys
t-reports/pfpt-us-ar-cost-of-cloud-compromise-and-
Moreover, the system's capability to analyze user shadow-IT.pdf. [Accessed: May-2022].
behavior, application access patterns, and data activity [4]. Verizon business, “2022 Data Breach Investigations
deviations provides administrators with valuable insights Report (DBIR),” [Online]. Available at:
into potential security vulnerabilities. By stack-ranking https://www.verizon.com/business/resources/reports/
entities and providing risk scores, administrators can dbir/. [Accessed: May-2022].
prioritize remediation efforts and allocate resources more [5]. IBM Security, “Cost of a Data Breach
efficiently, thereby strengthening the organization's overall Report,”[Online]. Available at:
security posture. https://www.ibm.com/downloads/cas/RZAX14GX
[Accessed: May-2022].
In conclusion, this research highlights the effectiveness [6]. Priyanka Neelakrishnan, “Enhancing Scalability and
of an integrated approach to cloud data security, combining Performance in Software-Defined Networks: An
advanced technologies with proactive monitoring and OpenDaylight (ODL) Case Study,” in Magnetism,
response mechanisms. By adopting such strategies, vol. III, G.T. Rado and H. Suhl, Eds. New York:
organizations can fortify their defenses against emerging Academic, 1963, pp. 271-350.
threats and mitigate the risks associated with sensitive data [7]. “Famous Insider Threat Cases,” [Online]. Available
access and exfiltration, thus safeguarding the integrity and at: https://gurucul.com/blog/famous-insider-threat-
confidentiality of their valuable assets. cases. [Accessed: June-2022].
[8]. “Trade Secret Theft,” [Online]. Available at:
https://www.fbi.gov/news/stories/two-guilty-in-theft-
of-trade-secrets-from-ge-072920. [Accessed: June-
2022].
[9]. “Real Life Data Breaches caused by Insider Threats,” [12]. “Credential Stuffing,” [Online]. Available at:
[Online]. Available at: https://en.wikipedia.org/wiki/Credential_stuffing.
https://www.ekransystem.com/en/blog/real-life- [Accessed: June-2022].
examples-insider-threat-caused-breaches. [Accessed: [13]. “Wedding Registry site Zola says Customer
June-2022]. Accounts were Hacked,” [Online]. Available at:
[10]. “Marriott International Notifies Guests of Property https://www.nydailynews.com/2022/05/23/wedding-
System Incident,” [Online]. Available at: registry-site-zola-says-customer-accounts-were-
https://news.marriott.com/news/2020/03/31/marriott- hacked/. [Accessed: June-2022].
international-notifies-guests-of-property-system- [14]. JagreetKaur; Kuldeep Kaur; Surya Kant; Sourav
incident. [Accessed: June-2022]. Das,”UEBA with Log Analytics,” IEEE 3rd
[11]. “2020 Twitter account Hijacking,” [Online]. International Conference on Computing, Analytics
Available at: and Networks (ICAN), 2023.
https://en.wikipedia.org/wiki/2020_Twitter_account_
hijacking. [Accessed: June-2022].
PRIYANKA NEELAKRISHNAN, was born on December 20, 1990. She holds a Bachelor of
Engineering degree in Electronics and Communication Engineering from Anna University, Chennai,
Tamil Nadu, India (2012); a Master of Science degree in Electrical Engineering with a focus on
Computer Networks and Network Security from San Jose State University, San Jose, California, United
States (2016); and a Master of Business Administration degree in General Management from San Jose
State University, San Jose, California, United States (2020). Currently, Priyanka works as a Product Line
Manager, Independent Researcher, and Product Innovator, specializing in driving product innovation and
development. Previously, she has held positions as a Senior Product Manager and Senior Software
Development Engineer at reputable cybersecurity firms.
Priyanka Neelakrishnan is also an accomplished author, having penned the book titled “Problem Pioneering: A Product
Manager’s Guide to Crafting Compelling Solutions”. She is currently in the process of writing another book focusing on
cybersecurity.